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Abstract. Consider a system 4* of non-constant afRne- linear forms ■01 , • • ■ , V't '-T^^ ^ 
Z, no two of which are hnearly dependent. Let be a large integer, and let K C 
[— A^, A^]"* be convex. A generalisation of a famous and difficult open conjecture of 
Hardy and Littlewood predicts an asymptotic, as A ^ oo, for the number of integer 
points n S Z'' n if for which the integers i\j\{n\ . . . , ^ptin) are simultaneously prime. 
This implies many other well-known conjectures, such as the twin prime conjecture and 
the (weak) Goldbach conjecture. It also allows one to count the number of solutions in 
a convex range to any simultaneous linear system of equations, in which all unknowns 
are required to be prime. 

In this paper we (conditionally) verify this asymptotic under the assumption that no 
two of the afhne- linear forms ^i, . . . , V't are afhnely related; this excludes the important 
"binary" cases such as the twin prime or Goldbach conjectures, but does allow one 
to count "non-degenerate" configurations such as arithmetic progressions. Our result 
assumes two families of conjectures, which wc term the inverse Gowers-norm conjecture 
(GI(s)) and the Mobius and nilsequences conjecture (MN(s)), where s € {1,2, . . .} is 
the complexity of the system and measures the extent to which the forms ipi depend 
on each other. The case s = is somewhat degenerate, and follows from the prime 
number theorem in APs. 

Roughly speaking, the inverse Gowers-norm conjecture GI(s) asserts the Cowers 
[/''+^-norm of a function / : [N] [—1, 1] is large if and only if / correlates with an 
s-step nilsequence, while the Mobius and nilsequences conjecture MN(s) asserts that 
the Mobius function /i is strongly asymptotically orthogonal to s-step nilsequences of 
a fixed complexity. These conjectures have long been known to be true for s = 1 
(essentially by work of Hardy-Littlewood and Vinogradov), and were established for 
s = 2 in two papers of the authors. Thus our results in the case of complexity s ^ 2 
are unconditional. 

In particular we can obtain the expected asymptotics for the number of 4-term 
progressions pi < p2 < Ps < Pi ^ N oi primes, and more generally for any (non- 
degenerate) problem involving two linear equations in four prime unknowns. 
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1. Intrgductign 



A Generalised Hardy-Littlewgod Cgnjecture. Let P := {2, 3, 5, . . .} c Z 
denote the prime numbers. We refer to the lattice points {pi, . . . ,pt) G P* as prime 
points in Z*. A basic problem in additive number theory is to count the number of 
prime points on a given affine sublattice of Z* in a given range. For instance, the twin 
prime conjecture asserts that the number of prime points in {(ra, n + 2) : n G Z} C Z^ 
is infinite. When the affine lattice is formed by intersecting Z* with an affine subspace, 
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this problem is equivalent to finding solutions to simultaneous linear equations in which 
all unknowns are prime. To formalise these types of problems more concretely, it is 
convenient to parameterise this lattice by d affine-linear forms, as follows. 

Definition 1.1 (Affine-linear forms). Let (i, t ^ 1 be integers. An affine-linear form 
on is a function t/^ : Z"' — > Z which is the sum if) = ip + iIj{0) of a linear form 
ip : Z'^ ^ Z and a constant ipi^) G Z. A system of affine-linear forms on Z'' is a 
collection \I' = {ipi, . . . ,ipt) of affine-linear forms on Z"^. To avoid trivial degeneracies 
we shall require that all the affine-linear forms are non-constant and no two forms are 
rational multiples of each other. The entire system \E' can be thought of as an affine- 
linear map from Z*^ to Z*, which is the sum = \E' + \E^(0) of a linear map \E' : Z*^ ^ Z* 
and a constant ^I'(O) G Z*; we refer to the range \E'(Z'^) of this map as an affine sublattice 
of Z*. We extend \E' (and \E') in the obvious manner to an affine-linear map from M.'^ to 
M*. If iV > 0, we define the size H^E'IIat of \E' relative to the scale N to be the quantity 



:i.i) 



where ei, . . . , is the standard basis for Z'' 



Example 1. The line {(n, + 2) : n G Z} is the affine lattice associated to the system 
: n I— > (ra, n + 2) with d = 1 and t = 2. This example has bounded size for any 
A^ ^ 1. The system : n h-^ {n, N — n) counts pairs of primes which sum to A^, and 
has bounded size at scale A^. 



In order to count the number of prime points on an affine lattice, it is convenient to use 
the von Mangoldt function A : Z ^ R"*", defined by setting A(n) := logp when n > 1 is 
a power of a prime p, and A(n) = otherwise (in particular, A(n) = whenever n ^ 0). 
We are then interested in estimating the sum 

neKnZ'i i&[t] 

where AT is a convex subset of M.'^ and [t] := {1, . . . ,t}. 

Remark. We do not necessarily assume that is injective, that is to say we allow the 
sum in (11.21) to count a single prime point repeatedly. This freedom will be convenient 
for us at a later stage of the argument when we increase the number d of parameters in 
order to place \E' in a certain normal form. However, in most applications of interest it 
will indeed be the case that is injective, and so the prime points are counted without 
multiplicity. 

The prime number theorem asserts that the average value of A(n) is 1 for positive n and 
for negative n, so it is first natural (cf. Cramer's model for the primes) to consider 
the much simpler sum 

where we use 1^; to denote the indicator of a set E (thus Ie{x) = 1 when x E E and 
^e{x) = otherwise). Let us assume that the convex body K is contained in the box 
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[—N,NY for some large integer N, and let us also assume the size bounds II^PIIat ^ L 
for some L > 0. Then a simple volume packing argument (see Appendix yields the 
asymptotic 

n WlV-.l^)) = /5oo + Od,t,L{N''~') =Poo + Od^tAN") (1.3) 

where the archimedean factor Poo is defined by 

/?oo :=Yoh{Kn^~\{R+y)) (1.4) 

(see ^for our conventions concerning asymptotic notation). Note that the main term 
jSoo is typically of size A^*^ or so. One can be much more precise about the nature of 
the error term, but we will not be concerned with quantitative decay rates here. Indeed 
the rates provided by our later arguments will be poor and often ineffective, and will 
dominate whatever gains one could extract from the error term in fll.3p . 

In view of (11.31) and the prime number theorem, one might naively conjecture that the 
expression (11. 2p also enjoys the asymptotic Poo + Od,t,L{-^'^) ■ However this is not the 
case due to local obstructions at small moduli. For instance, we have 

N 

Y,Hqn + b)=Az,{b)N + Og{N) (1.5) 

n=l 

whenever q ^ 1 and |6| ^ q, where A^^ : Z is the local von Mangoldt function, that 

is the g-periodic function defined by setting A^^{b) := when b is coprime to q and 
Axg{b) = otherwise. Here Zg := Z/qZ is the cyclic group of order q and 0(g) := |Z^| 
is the Euler totient function. We shall refer to (II. 5p as the prime number theorem in 
APs. A well-known quantitative version of this result is the Siegel-Walfisz theorem, 
which establishes the asymptotic (II. 5p uniformly in the range q ^ log^ A^ for any fixed 
A. In this range, the o-term is ineffective, and if one wishes for an effective error term 
it is necessary to restrict to g ^ log^"'' A^ for some 6 > 0. See pTl p. 123] for details. 

More generally, given a system \E' = (-^i, . . . jipt) of affine- linear forms, one can define 
the local factor Pq for any integer g ^ 1 by the formula 

:=E„ez^ J]A^,(^,(n)). (1.6) 

The symbol E denotes expectation or averaging; see ^ for more details. From the 
Chinese remainder theorem we see that this factor is multiplicative, indeed we have 
Pq = Y[p\qPpy where the product is over all prime£|p dividing g. We then have 

Conjecture 1.2 (Generalised Hardy-Littlewood conjecture). Let N,d,t,L be positive 
integers, and lef^ = {ipi, . . . ,ipt) be a system of affine-linear forms with size ||^E'||Af ^ L. 
Let K C [— A^, A^]*^ be a convex body. Then we have 

J2 l[A{Un))=Pool[P, + Ot,d,L{N') (1.7) 

""^More generally, we adopt the convention that whenever a product ranges over p, that p is understood 
to be restricted to the primes. 
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where the archimedean factor (3ao and the local factors (3p for each prime p were defined 



Roughly speaking, this conjecture asserts that A "behaves hke" the independent product 
of 1r+ and A^^, as p ranges over primes. In typical applications, the quantities /?oo and 
j3p are quite easy to compute explicitly: see Examples [MH] below. We shall refer to the 
quantity Jlp/^p as the singular product. The local factors Pp can be easily estimated: 

Lemma 1.3 (Local factor bounds). With the hypotheses of Conjecture \l.S[ we have (3p = 
1 + Ot^d,L{p^^) ■ If furthermore no two of the forms tpi, . . . ,ipt o,re affinely related {i.e. 
no two of the forms tpi, . . . jtpt o,re parallel), or if p > C{t, d, L)N for some sufficiently 
large constant C(t, d, L), then we have [3p = 1 + Ot,d,L{p~'^) ■ 

Proof. Without loss of generality we may assume p to be large compared to t, d, L, as 
the claim is trivial otherwise. Let n be selected uniformly at random from Z^. Since 
the ipi are non-constant, we easily see that Aipiipiin)) will equal with probability 
1 — ^, and otherwise. In particular the product in (11.61) is equal to {^^Y = 1 + Ot{^) 
with probability 1 — Of(^) and zero otherwise, which gives the first bound on Pp. Now 
suppose that either no two of ipi, . . . ,ipt are affinely related, or that p > C{t, d, L)N for 
some sufficiently large C{t,d,L). Then for any 1 ^ i < j ^ t, we see from elementary 
linear algebra that ipi{n) and ipj{n) will simultaneously be divisible by p with probability 
O(^); the point is that the hypotheses imply thall and ipj cannot be linear multiples 
of each other modulo p. The desired bound on f]p then follows from a simple application 
of the Bonferroni inequalities (that is, the fact that truncations of the inclusion-exclusion 
formula give upper and lower bounds alternately). □ 

In particular we see that the singular series Yip Pp is always convergent (though it could 
vanish, thanks to the presence of the small primes p = Ot^d,L{^))- 

A straightforward argument shows that Conj ecture 1 1 . 21 implies a conjecture which counts 
primes more explicitly: 

Conjecture 1.4 (Generalised Hardy-Littlewood conjecture, again). Let N, d, t, L, K 

be as in Conj ecture Then 



Remarks. It would be slightly more accurate to replace ^^^^ with the more precise 
expression 



in dH, (USD. 



irnZ'^n^"^(P*)| = #{n e KnZ'^ : ^i(ra), . . . ,^j(n) prime} 




(1.8) 




One could view this as a (very simple) manifestation of the Lefschetz principle. 
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but the difference between these two expressions can be absorbed into the quahtative 
Ot,d,L{) error terms. In most (though not quite all) cases, the singular series Hp/^p 
bounded by Ot,d,L{^), which allows one to absorb the first error term into the second. In- 
formally speaking, this conjecture asserts that the probability that a randomly selected 
point in \E'(Z'^) fl Z\_ of magnitude iV is a prime point is asymptotically jj^p-^ Ilp/^p- 



Sketch proof of Conjecture \1.4\ assuming Conjecture li.M Let < e < 1 be a small 
quantity (depending on A^, d, t, L) to be chosen later. The contribution to (11.81) where 
mini^i^t l'^j(^)| ^ N^~^ can easily be shown to be Ot,d,L,e{N^~'^^'^) by crude estimates; 
the analogous contribution to (11.71) can similarly be shown to be Ot,d,L,e{N'^). The con- 
tribution to (11.71) where at least one of the ipii^n) is a power of a prime p'^,p^, . . . can 
similarly be shown to be Ot,d,L{N'^). Finally, for the remaining non-zero contributions 
to (II. 7p . the quantity HiGit] ^(^«(^)) equal to (1 + 0(t£:)) log* A^. Putting all this 
together, we see that the left-hand side of (11.81) is 



P 

Setting e to be a sufficiently slowly decaying function of (for fixed t, L) we obtain 



Op o 

ly decaying function of (foi 
the claim. □ 



Note that the case d = t = 1 oi the generalised Hardy-Littlewood conjecture is essen- 
tially the prime number theorem in APs (11.51) . We have been referring to the generalised 
Hardy-Littlewood conjecture because Hardy and Littlewood [2H] in fact only conjectured 
an asympotic for the number of n ^ A^ for which the forms n + 6i, . . . , n + 6^ are all 
prime. If this were generalised to deal with the case of forms aiu + 6i, . . . , atu + bt - 
the case d = 1 of Conjecture 11.21 - then a c?-parameter version along the lines we have 
been discussing would follow easily by holding d—1 of the variables fixed and summing 
in the remaining one. One has the impression that, had they thought to ask the ques- 
tion. Hardy and Littlewood would easily have produced a conjecture for the asymptotic 
formula. The name of Dickson is sometimes associated to this circle of ideas. In the 
1904 paper [12], he noted the obvious necessary condition on the Oj, bi in order that the 
forms aiu + bi, . . . , atU + bf might all be prime infinitely often and suggested that this 
condition might also be sufficient. 

Dickson also suggested that the "experts in the new Dirichlet theory" try their hand at 
establishing this. His hope has yet to be realised, however, since the d = 1, t > 1 case of 
Conjecture 11.21 seems to be extremely difficult. The twin prime, Sophie Germain, and 
wealU even Goldbach conjectures, for instance, follow easily from the ci = 1, t = 2 case 
of the conjecture. These cases are probably well beyond the reach of current technology, 
although we remark that if one replaces the von Mangoldt function A with substantially 
simpler weight functions arising from the Selberg sieve then such asymptotics can 
be obtained by standard sieve theory methods (see Theorem ID.3p . This in turn leads 
to upper bounds on (II. 2p which differ from (II. 7p only by a multiplicative constant 
depending only on d, t, L. 



'That is, the conjecture that every sufficiently large even number is the sum of two primes. 



LINEAR EQUATIONS IN PRIMES 



7 



Note also that it is possible to establish the case d = 1, t > 1 of the Hardy-Littlewood 
conjecture on average over the choice of forms ipi, . . . , t/'^ in a certain sense: see J3j. This 
essentially amounts to increasing d, which can place one back in the "finite complexity" 
regime discussed below. 

Complexity. We will not make any progress on the d = 1, t > 1 case here, but 
instead focus on the substantially simpler cases when d > 1 and the system is "finite 
complexity" in the following sense. 

Definition 1.5 (Complexity). Let \1' = {ipi, . . . ,ipt) be a system of affine-linear forms. 
If 1 ^ i ^ t and s ^ 0, we say that \E' has i- complexity at most s if one can cover the 
t — 1 forms {ipj : j G by s + 1 classes, such that ipi does not lie in the affine-linear 

span of any of these classes. The complexity of the \E' is defined to be the least s for 
which the system has i-complexity at most s for all 1 ^ i ^ t, or cx) if no such s exists. 

Remark. It is easy to see that one can replace "cover . . . by" by "partition . . . into" 
in the above definition without affecting the definition of z-complexity or complexity. 
While partitions are slightly more natural here than covers, we prefer to use covers as 
it makes it a little easier to compute the complexity in some cases. 

Examples 1. The system \E'(ni, . . . ,nd) := (ni, . . . ,nd), which counts d-tuples of inde- 
pendent primes, has complexity 0, because no form lies in the affine span of all the 
other forms. For any k ^ 2, the system \E'(ni,n2) := {ni,ni + n2, . . . ,ni + {k — l)n2), 
which counts arithmetic progressions of primes of length k, has complexity k — 2, be- 
cause each form does not lie in the affine span of any other individual form, though it is 
in the affine span of any two other forms. The system \E'(ni, := {ni,n2, N — ni — ^2), 
which counts triples of primes that sum to a fixed number A^, has complexity 1. The 
system ^^(77,1,712) := {ni,n2,ni + ^2 — + 2n2 — 2), which counts progressions of 
primes of length three, whose difference ^2 — 1 is one less than a prime, has complexity 
2. The system \E'(ni) := (ni, rii + 2), which counts twin primes, has infinite complexity. 
So too does the system \I'(ni) := (rii, — ni), which counts pairs of primes which sum 
to a fixed number A^, as well as "^{ni) = {ni,2ni + 1), which counts Sophie Germain 
primes. More generally, any system with d = 1 and t > 1 has infinite complexity. 

Example 2 (Cubes). Let d ^ 2 and t := 2'^^^. Then the system 



(which counts {d — l)-dimensional cubes whose vertices are all prime) has a very large 
value of t, but has complexity at most d — 2. For instance, if one considers the form rii, 
then one can cover the other t — 1 forms by — 1 classes, with the i^^ class consisting of 
those forms which involve 7ij+i, then Ui is not in the affine span of any of these classes 
because the i*^ class always assigns the same coefficient to both ui and 71^+1. The other 
forms can be treated similarly after "refiecting" the cube appropriately. 

Example 3 (IPq cubes). Let d^ 1 and t := 2'^ — 1. Then the system 



^'(ni, . . . ,nrf) := (rii + ^^j) 



AC{2,...,d}' 
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which counts dimensional cubes pinned at the origin whose remaining vertices are 
one less than a prime, also has a large value of t but has complexity at most d — 1, for 
reasons similar to the previous example. 

In fact in Example [2] the complexity is exactly d — 2, whilst in Example [3] it is exactly 
d — 1. We leave the proofs to the reader. 

Example 4 (Balog's example). Let d^ 2 and t := Then the system 

^(ni, . . . , nd) := (n^ + Uj + l)i^i<^j<^d, 

which counts (i-tuples of odd primes pi, . . . ,pd, all of whose midpoints ^'^^^ are also 
prime, has complexity 1, even though t is quite large. Indeed, if one considers the form 
Hi + rij + 1 with i < j, one can partition the other t — 1 forms into two classes, those 
which do not involve n^, and those which do involve rij (and hence do not involve rij), 
and rii + nj + 1 is an affine-linear combination of neither of these two classes. If instead 
one considers the form rij + + 1 = 2nj + 1, one can partition the other t — 1 forms into 
two classes, those which involve rii (and one other Uj), and those which do not involve 
rii at all, and again 2nj + 1 is an affine-linear combination of neither of these two classes. 

The complexity is a little difficult to compute directly, but the following lemma gives 
some easy bounds on this quantity. 

Lemma 1.6 (Complexity bounded by codimension). Let \I' = {ipi, . . . ^ipt) he a system 
of affine-linear forms. Then this system has finite complexity if and only if no two of 
the ipi are affinely dependent. Furthermore, in this case the complexity of the system is 
less than or equal to t — dim(\l/). 

Proof. If two of the forms ipi and ipj are affinely related, then it is not possible for the 
i-complexity to be finite, as ipi will lie in the affine span of any collection of forms which 
contain tpj. Conversely, if no two of the ipi are affinely related, then the i-complexity 
is at most t — 2, as we can partition the t — 1 forms {tpj : j G [t]\{0} i^^^ singletons. 
This gives the first claim of the lemma. 

Now suppose that no two of the tpi are affinely dependent. Write r := dim(\E'). Choose 
any homogeneous form, say ipi; this will be nonzero. Relabelling if necessary, we may 
suppose that {'ipi, . . . , V'r} is a basis for Consider the set {^'2, • • • , V'r} along with the 
singleton sets {ipr+i}, ■ ■ ■ , {i^t}- Clearly ipi is not in the affine-linear span of any such 
set, and so the system has 1-complexity at most t — r. Since this is true with any ipi in 
place of ipi, the claim follows. □ 

Remark. This lemma is sharp in all the cases treated in Examples [Tj but is very far from 
sharp in Examples [2}|11 It asserts that the infinite complexity systems are precisely those 
which encode a "binary" problem such as the twin prime, Goldbach, Sophie Germain, 
or prime tuples conjectures. Observe from Lemma 11.61 and Lemma 11.31 that if the 
system has finite complexity, then Pp = I + Ot,d,L(^) and so the singular series Hp/^p is 
either zero, or is bounded above and below by constants depending only on t, d, L. In 
particular we can eliminate the first error term in fll.Sp in this setting. 
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For systems of complexity 0, The generalised Hardy-Littlewood conjecture follows easily 
from the prime number theorem in APs fll.Sp . For systems of complexity 1, the conjec- 
ture can be treated by the Hardy-Littlewood circle method (see e.g. [Sill])- Systems of 
complexity 2 or higher, on the other hand, are largely out of reach of the circle method 
and the conjecture has remained open in these cases. 

We mention two directions in which a partial approach to high complexity cases of the 
generalised Hardy-Littlewood conjecture has been made. The first is that a version of 
the conjecture remains true if one is willing to enlarge sufficiently many of the A factors, 
replacing primes with some notion of an almost prime, and adjust the singular series 
appropriately; see for instance Theorem ID. 31 for a simplified version of this result. One 
consequence of this is that upper bounds in fll.7p (or (II. 8p ) are known which are only 
off by a multiplicative constant of Ot^d,L{^)- 

For certain special systems a lower bound of the correct order of magnitude is available. 
For some systems such as the cube systems in Example [2] this is rather simple, involving 
nothing more than a few applications of the Cauchy-Schwarz inequality, despite the fact 
that such systems can have arbitrarily high complexity. However, the task of obtaining 
asymptotics here is just as difficult as obtaining asymptotics for other systems; see [32] 
for some related discussion of this phenomenon. 

There is also the system \E'(ni,n2) := (^1,7^1 + ^2, ■ ■ ■ ,ni + {k — 1)722) of arithmetic 
progressions of length k, for which the powerful tool of Szemeredi's theorem |39] was 
available. Despite the fact that these systems can have arbitrarily high complexity, a 
lower bound for (11.71) and (II. 8p was established which was again only off by a multi- 
plicative constant. In particular this implied that the primes contain arbitrarily long 
arithmetic progressions; see [21]. 

Our arguments in this paper borrow many ideas and results from [21], in particular 
drawing heavily on the transference principle developed in that paper. However we 
shall not use Szemeredi's theorem in this paper, as it does not apply to the general 
systems of affine-linear forms studied here. Roughly speaking, one only expects Sze- 
meredi-type theorems for systems which are homogeneous (so ^1/(0) = 0) and translation 
invariant, that is the lattice \E'(Z'^) contains the diagonal generator (1, . . . , 1). In any 
case Szemeredi's theorem only provides lower bounds and not asymptotics. 

Main result. Our main result settles the generalised Hardy-Littlewood conjecture 
for any system of affine-linear forms of finite complexity, conditional on two simpler, 
partially resolved, conjectures. 



Main Theorem (Generalised Hardy-Littlewood conjecture, finite complexity case). 
Suppose that the inverse Gowers-norm conjecture GI(s) and the Mobius and nilsequences 
conjecture MN(s) are true for some finite s ^ 1. Both of these conjectures will be stated 
formally in ^ Then the generalised Hardy-Littlewood conjecture is true for all systems 
of affine-linear forms of complexity at most s. 
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We have deferred the precise statement of the conjectures GI(s) and MN(s) to [|H] on 
account of the fact that both of them are somewhat technical to state formally. The 
impatient reader may wish to jump to that section to view these conjectures, but for 
now we settle for informal one-line statements of them. 

The inverse Gowers-norm conjecture GI(s) gives an explicit criterion as to when a 
bounded sequence of complex numbers is "Gowers uniform of order s", this being a 
measure of pseudorandomness of the sequence; namely, this Gowers uniformity holds 
whenever the sequence fails to be correlated with any s-step nilsequence. 

The Mobius and nilsequences conjecture MN(s) asserts that the Mobius function fi{n) 
(which is of course closely related to A(n)) does indeed have negligible correlation with 
all s-step nilsequences. 

Neither of these two conjectures are fully resolved at present. However, the case s = 1 is 
classical and was essentially already present in the work of Hardy-Littlewood and Vino- 
gradov, though not in this language. The conjecture GI(2) was settled more recently 
in [26], while the conjecture MN(2) was settled in [27]. Because of this, we have the 
following unconditional result: 

Corollary 1.7. The generalised Hardy-Littlewood conjecture is true for all systems of 
affine-linear forms of complexity at most 2. In particular, thanks to Lemma \1.6{ the 
generalised Hardy-Littlewood conjecture is true for any system = [tpi, . . . , ipt) in which 
no two ipiyipj are affinely dependent, and such that codim(\E'(M"')) ^ 2. 

We expect both GI(s) and MN(s) to be settled shortly for general s, and hope to report 
on progress on both of these conjectures in the not-too-distant futur^. We therefore 
expect to settle the generalised Hardy-Littlewood conjecture entirely in the finite com- 
plexity case, or in other words we should be able to remove the last hypothesis in 
Corollary II. 71 The only unresolved case of the generalised Hardy-Littlewood conjecture 
would then be the presumably very hard "binary" or "infinite complexity" case in which 
two or more of the forms are affinely related. 

Let us now state some particular new consequences of our results. The first three are 
unconditional, while the last two require further progress on the inverse Gowers-norm 
and Mobius and nilsequences conjectures. 

Example 5 (APs of length 4) . The number of 4-tuples of primes Pi < P2 < P3 < Pi ^ N 
which lie in arithmetic progression is (1 + o(l))(Si ^^^ ^ , where 

This follows from Corollary I L 71 with the system \E'(ni, := (ni, ni + n2,ni + 2n2, ni + 
3^2), with K being the convex region {(ni,n2) : 1 ^ ni ^ ni + 3^2 ^ A^}; one has 
Poc = A^VS, (32 = 4, /?3 = 9/8, and (3p = 1 - for p ^ 5. Note that the results in 



Note added in April 2008: in a recent preprint, the authors have fully resolved the MN(s) conjecture 
for every s. 
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do not give this asymptotic, instead yielding a lower bound of (c + 0(1)) ^^^^ for 
some explicitly computable but rather small constant c > 0. 

Example 6 (APs of length 3 with common difference p ± 1). The number of triples of 
primes pi < p2 < Ps ^ N in arithmetic progression, in which the common difference 
P2 — Pi is equal to a prime plus 1, is (1 + o{l))&2N'^ log"'* A^, where 

&2 := TT (1 - 4^^^^) - 1-0481. 

The same asymptotic holds for progressions in which p2 — pi is a prime minus 1. This 
follows from a similar application of Corollary 11.71 as in Example [5l 

Example 7 (Vinogradov 3-primes theorem with a constraint). Let be a large odd 
integer. Then the number of distinct representations of A^ as pi + p2 + Ps in which 
Pi — P2 is equal to a prime minus 1 is (©3(A^) + 0(1)) ^^^^^ , where 

«3w^4 n (-^) n (^-^). 

p\N^-N p>(N^-N 

Thanks to Lemma [T73l we see that ©3(A^) is bounded above and below by absolute pos- 
itive constants independently of A^. Again, this result follows from a specific application 
of Corollary 11.71 

Example 8 (APs of length k). Let ^ 2 be a fixed integer. Assume the GI(A; — 2) 
conjecture and the MN(/c — 2) conjecture. Then the number of /c-tuples of primes 
Pi < P2 < ■ ■ ■ < Pk ^ N which lie in arithmetic progression is 

iV2 



2(A-l)ll- + '"->'';iog'iV 



where 



IIP 

Pp ■— \ / \ / \k-l 



if p ^ k 



The = 4 case of this is Example [5l the k = 3 case is due to van der Corput jlTj; and 
the k = 1,2 cases are equivalent to the prime number theorem. For comparison, the 
arguments in p4] give an unconditional lower bound of (cfc + o(l)) ^^^ for some > 0. 

Example 9 (P — 1 and P + 1 are IPo-sets). Assume s ^ is such that the GI(s) 
and MN(s) conjectures are true. Then (thanks to Example [3]) there exist infinitely 
many s + 1-tuples (rii, . . . , Ug+i) of distinct positive integers such that all of the sums 
{SieA^* '■ ^ — [-5 + 1]) ^ 7^ 0}) equal to a prime minus 1. Similarly for the primes 
plus 1. In particular, we unconditionally have the new result that there are infinitely 
many distinct ni, 77,2, such that ni, n2, na, ni + n2,ni + ^3,772 + 113,711 + n2 + are 
all one less than a prime. 



Another consequence of the Main Theorem concerns counting the number of solutions 
in a given range to a system of linear equations, in which all unknowns are required to 
be prime: 
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Theorem 1.8 (Linear equations in primes). Assume the GI(s) andMN{s) conjectures. 
Let A = {aij) be an s x t matrix of integers, where s ^t. Assume the non-degeneracy 
conditions that A has full rank s, and that the only element of the row-space of A overQ 
with two or fewer non-zero entries is the zero vector. Let N > 1, letb = {hi, . . . ,bs) G 
be a vector in All = {Ax : x E Z*}, and suppose that the coefficients \aij\ and the 
quantities \bi/N\ are uniformly bounded by some constant L. Let K C [— A^, A^]* be 
convex. Then we have 

j2 n ^(^^) = «oc n + ot,L,.(iv*~i, (1.9) 

Ax=b 

where the local densities ap are given by 

ttp := lim IE^e[-M,M]*,Ax=bTT Azp(a;i) (1.10) 

i&[t] 

and the global factor ctoo is given by 

:= #{s eZ^ : X e K,Ax = b,Xi^ 0}. (1.11) 

Theorem 11.81 follows easily from the Main Theorem and some elementary linear algebra: 
the details may be found in §11 The quantities ap and ctoo can be easily computed 
in practice. One can also formulate an analogue of Theorem 11.81 which counts prime 
solutions to Ax = b, just as Conjecture 11.41 could be deduced from Conjecture II. 2[ We 
leave the details to the reader. Theorem 11.81 is not the most general consequence of the 
Main Theorem, but it is rather representative. For instance, it already implies Examples 
[3HH] (and also implies Example if GI(s) and MN(s) are known for all s). 

Another simple "qualitative" consequence of the Main Theorem is the following. 

Corollary 1.9 (Qualitative generalised H-L conjecture for finite complexity systems). 
Suppose that Gl{s) and MN(s) are true for some s ^ 1. Let \I' = {ipi, . . . , ipt) : Z*^ ^ Z* 
be a system of complexity at most s, and let K C M.'^ be an open convex cone, that is 
to say an open convex set which is closed under dilations. Suppose that we have the 
following two local solvability conditions: 

• (Solvability at p) For each prime p, there exists n G Z'^ such that the forms 
ipi{n), . . . , ipt{n) are all coprime to p. 

• (Solvability at oo) There exists n E K PiZJ^ such that ipiln), . . . ,ipt{n) > 0. 

Then there exist infinitely many n E K nZ'^ such that ipiln), . . . , ipti^) are all prime. 

Remark. This significantly generalises the main theorem in [23] that the primes contain 
infinitely many progressions of length k, though for progressions of length k > 4 the 
argument here is conditional on the conjectures Gl{k — 2) and MN(/c — 2). 

Proof. If we truncate K to [— A^, A^]"^, then the hypotheses ensure that Poo ^K,d N'^ and 
/5p 7^ for all p. From Lemma fT73l we conclude that Poo Yip Pp ^K,^,d,t N''-, and the 
claim now follows by letting N ^ oo. □ 
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2. Overview of the paper 

This section is a kind of roadmap for the rest of the paper, and is somewhat informal 
in nature. Also, it employs some terminology which will only be rigorously defined in 
later sections. 

The bulk of the paper will be concerned with the proof of the Main Theorem. A substan- 
tial portion of our argument consists of reprising the transference principle machinery 
from [2l]. This allows us to model certain unbounded functions, such as A, by bounded 
ones. Another large component of this paper consists of some facts on nilmanifolds 
which are essentially contained in papers in the ergodic literature, particularly that of 
Host and Kra [32] ■ Unfortunately, as our situation here is slightly different from that in 
we cannot simply cite the results we need directly from that paper, and for similar 
reasons we cannot cite the nilmanifold material directly. Thus we have placed a large 
number of appendices in this paper in which we slightly modify the arguments from 
these sources to suit our present needs. 

In §4] we use linear algebra to deduce Theorem 1 1 . 8 1 from the Main Theorem, and also to 
reduce the Main Theorem to a simplified form. Theorem 14.51 in which the archimedean 
factor Poo is not present and the system is in a certain "normal form". Then we 
use the 'W-trick" from [24] to eliminate the local factors jSp and reduce matters to 
establishing a discorrelation estimate. Theorem 15. 2^ for certain variants A^,^ — 1 of the 
von Mangoldt function. 

In ^ we recall one of the main ingredients of [21] . This is the idea that the von Mangoldt 
function A, or more precisely the variants A^ ^— 1, are dominated by a certain enveloping 
sieve v which obeys some good pseudorandomness properties. The verification of these 
properties is essentially given in [231 Ch. 9,10]. We take the opportunity, in Appendix 
[D| to give a simpler variant along the lines of unpublished notes of the second author 

In ^ we recall the generalised von Neumann theorem from [23], which allows us to 
use the pseudorandom enveloping sieve z/ to deduce the desired discorrelation estimate. 
Theorem 15.21 from a Gowers uniformity estimate on A'f,,^ — 1. This latter estimate is 
the content of Theorem 17.21 We in fact provide a more general type of generalised von 
Neumann theorem: the one in [23] was specific to the case of arithmetic progressions, 
and did not allow one to count points inside an arbitrary convex body K. The basic 
theory of Gowers uniformity norms is reviewed in Appendix [B], whilst the generalised 
von Neumann theorem itself is proved in Appendix O following some preliminaries on 
convex geometry in Appendix |X1 
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To prove the Gowers uniformity estimate, we begin by stating in ^ the two conjec- 
tures we need, namely the inverse Gowers-norm conjecture GI(s) and the Mobius and 
nilsequences conjecture MN(s). At this point we pause to present some easy conse- 
quences of these conjectures, deducing in ^ some results concerning the behaviour of 
the Mobius and Liouville functions along systems of linear forms. These functions have 
an advantage over A, in that they are bounded by 1. 

In ^ |T0] we apply the transference principle technology from [21] to extend the inverse 
Gowers-norm conjecture GI(s) to cover functions which are bounded only by a pseu- 
dorandom measure. This result, Proposition llO.il is in a sense the conceptual heart of 
the paper. Once this is done the matter is reduced to the task of showing that AJ, — 1 
is asymptotically orthogonal to nilsequences. The precise statement of such a result is 
Proposition 110.21 

At this point we need a technical reduction, replacing a nilsequence by a slightly better 
behaved averaged nilsequence. This reduction is carried out in ^Hl and uses some 
basic structural facts about nilmanifolds and the cubes within them. These facts are 
somewhat difficult to extract from the literature, so we give them in Appendix [El In 
preparing this appendix we benefitted much from conversations with Sasha Leibman. 

Finally, to show that A^ — 1 is asymptotically orthogonal to an averaged nilsequence, 
we split A into a "smooth" part A** and a "rough" part A^ This is a fairly standard 
construction in analytic number theory which we learnt from [3l]. The contribution of 
the smooth part A" can be handled by the Gowers-Cauchy-Schwarz inequality flB.12p . 
combined with correlation estimates for truncated divisor sums. The latter type of esti- 
mates are given in Appendix iD]- the technology is that we used to build the enveloping 
sieve. The rough part A** can be handled by the Mobius and nilsequences conjecture 
MN(s), thus concluding the proof. 

In 513] we gather some concluding remarks concerning possible extensions of our results, 
as well as possibilities for making our estimates effective. We also indicate a proof of 
(say) the asymptotic in Example 15] which is somewhat shorter than the one given here, 
but is harder to motivate from the conceptual point of view. 

In §14lwe gather some remarks concerning bounds for the error terms in our main results. 
The most interesting part of this discussion focusses on what can be said assuming GRH, 
since unconditionally all error terms are at present completely ineffective. 

The remainder of the paper consists of appendices which supply proofs for various results 
that we need, but which require techniques which are either standard or somewhat 
outside the line of the main portion of the paper. 
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3. General notation 



Our conventions for asymptotic notation are as follows. We use Oai,...,afc(-^) to denote a 
quantity which is bounded in magnitude by Ca^^,„^akX for some finite positive quantity 
Cai,...,afe depending only on oi, . . . , a^; we also write Y <ai,.,a;, X oi X >ai,..,afe Y for 
the estimate \Y\ ^ Oa^,...,afc(-^)- 

In this paper we always think of the parameter as "large" or "tending to infin- 
ity". Thus we use Oai,...,afc(-^) to denote a quantity bounded by Cai^...^aj. (A^)X, where 
Cai,...,afe(^) is a quantity which goes to zero as ^ cxd for each fixed ai, . . . , a^. We do 
not assume that the convergence is uniform in these parameters Oi, . . . , a^. 

We do not require the implied constants Ca^,...,aj., Cai,...,afc(^) to be effective. While the 
arguments presented in this paper are entirely effective, the bounds that arise in the 
Mobius and nilsequences conjecture MN(s), Conjecture 18.51 inevitably involve Siegel 
zeroes and are thus ineffective with current technology. They are, however, effective if 
the GRH is assumed. 

The o-notation being reserved for functions which become small as oo, we in- 

troduce a further notation, the ^-notation, for functions which tend to zero as their 
parameters become small. Thus denotes a quantity which tends to as (5 — >■ 0. 
Once again the k may be subscripted by other parameters, indicating a rate of decay 
which depends on those parameters. 

We will frequently take advantage of the fact that two errors involving different param- 
eters can often be concatenated by choosing one of the parameters properly. To give a 
typical example, suppose we have a quantity Q{N) for which we have established the 
bound 



where e G (0, 1) is a parameter at our disposal and Q{N) does not depend on e. Then 
we can concatenate the two error terms by optimising in e and conclude that 



Indeed for fixed e one may choose so large that the 0^(1) term in fl3.1l) is at most e. 
This means that Q{N) = e + ^(e), still a function of the form K{e). Since e can be as 
small as one likes, one obtains Q{N) = o(l). Note that this kind of trick was already 
used to deduce Conjecture 11.41 from Conjecture II. 2[ 

If ^ is a finite non-empty set and / : A ^ C is a function, we write |y4| for the 
cardinality of A and K^eAfix) := J2xeAfi^) the average of / on A. We extend 
this notation to functions of several variables in the obvious manner, thus for instance 

^x&A,y&Bfix,y) := JJ^\J2x&AJ2yeBfi^^y)- 

For any integer A^ ^ 1, we use [A^] to denote the discrete interval [A^] := {1, . . . , A^}, 
while Z^r denotes the cyclic group TL^ := TLjXJj. At some places in the argument it will 



Q{X) ^ 0,(1) + 



(3.1) 



Q{N) = 0(1). 



(3.2) 
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be convenient to pass from intervals [A^] to cyclic groups Z^r, possibly after modifying 
by a constant multiplicative factor. 

The letter i is too important for use only as the square-root of minus one. Occasionally 
it will be used in this capacity and as an index in the same formula. This ought not to 
cause any confusion; an earlier attempt to write a/— 1 throughout made several of our 
formulae rather difficult to read. 

In an earlier version of the paper we used vector notation such as x to indicate that 
certain elements lay in product spaces such as Z'^. It was discovered that consistent 
use of this notation rendered certain of our expressions rather difficult to read, and so 
we have abandoned this practice. The reader may, at certain times, need to carefully 
remind herself of the spaces in which certain variables take values. 

Important convention. For the rest of the paper, the parameters t, d, s, L (which 
control the size and complexity of our system \Ef = ('?/'i)iG[t] of linear forms). All implied 
constants in the <^, 0( ), or o( ) notation are understood to be dependent on these 
parameters t, d, s, L, even if we do not subscript them explicitly. In particular, any 
quantity depending just on t,d,s,L is automatically 0(1). Note however that we do 
allow our system \E' to vary (for instance, in order to encompass Vinogradov's three- 
primes theorem, \l/ must depend on A^), and our estimates will be uniform in the choice 
of \E' so long as the parameters t, d, s, L remain fixed. 



4. Linear algebra reductions 

In this section we show how the Main Theorem implies Theorem II. 8[ and also reduce 
the Main Theorem to the case in which the system \l/ is placed in a suitable "normal 
form" . More precisely, in this section we reduce both the Main Theorem and Theorem 
11.81 to the simpler Theorem 14.51 Our methods here use only elementary linear algebra. 
In particular we do not require precise knowledge of exactly what the conjectures GI(s), 
MN(s) are at this point. We will however restrict to the case s ^ 1, because the case 
s = follows from the s = 1 case (note that the conjectures GI(1), MN(1) are known to 
be true) and in any event the s = case can be easily deduced from (11. 5p . This allows 
us to avoid some degeneracies later on. 

Derivation of Theorem 11.81 from the Main Theorem. Suppose that we are 
in the situation of the Main Theorem. Because A has full rank, and b lies in the set 
y4Z*, the set r := {x G : Ax = 6} is a non-empty affine sublattice of Z* of rank 
d := t — s. Since b = 0{N) and A have bounded integer coordinates, it is not hard to 
see that T must contain at least one point of magnitude 0{N). For instance, one could 
apply any standard linear algebra algorithm to produce an element of F, which will then 
necessarily have magnitude 0{N) from inspection of the algorithm. Furthermore, the 
generators of this lattice can also be chosen to have magnitude 0(1), again by applying 
standard linear algebra algorithms. Thus we have a multiplicity-free parameterisation 
F = \E'(Z*~'') for some system of affine-linear forms = {ipi, . . . , ipt) with H^E^Hat = 0(1). 
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The full rank of A ensures that the codimension of \E'(Z'^) is the minimal value, namely 
s. We can then write the left-hand side of (11.91) as 

neK'nz*-" i€lt] 
where K' C M*~* is the convex body 

K' := {y G M*"' : ^(y) G K}. 
Note that K' is contained in the box [-A^', Nf-' for some A^' = 0{N). 

If two of the ipi were affinely dependent then two of the coordinates of lattice points 
in r would obey an affine-linear constraint. This is equivalent to the row space of A 
containing a non-trivial vector with at most two non-zero entries, which is contrary to 
assumption. From Lemma [1.61 we conclude that \1' has complexity at most s. We now 
invoke the Main Theorem. Comparing (11.71) with (11.91) we see that we will be done as 
soon as we show that Hp'^p — Poo YlpPp + o{N'^). For any fixed prime p, the set 
{n G : \E'(n) G [— M, M]*} is asymptotically uniformly distributed in residue classes 
in Zp"** in the limit M — oo and hence ap = Pp. Since the product Hp/^p either zero 
or comparable to 1, it thus suffices to show that = Poo + o{N'^). But this follows 
from (D. □ 



Elimination of the Archimedean factor. We now return to the task of proving 
the Main Theorem, using some simple linear algebra to obtain some reductions. 

First of all, we can use the following easy trick to hide the "archimedean factor" /3oo 
from view. Clearly we may intersect K with the convex set \E'~^((M^)*) and reduce to 
the case where ipi > on K; in this case Poo is simply the volume of K. In light of (11.31) 
and the boundedness of the product Yip Pp^ we can then rewrite (II. 7p as 

E (n^(^«H)-n/^p)=«(^')- (4-1) 

Remark. One can easily verify the "local" version of this formula, 

E {U^^Mi^))-Pp)=OpiN'); 

neKnZd i€[t] 

indeed this is a variant of the identity ap = Pp discussed previously. 

It turns out to be convenient to strengthen the condition tpi > slightly, say to ipi > 
A^s/io. The exact power of is not important so long as it lies between and 1. One 
can easily verify, by estimating A crudely by logA^, that for each i the contribution of 
the case ^ tpiin) ^ N^/^^ to (14.11) is o{N^). We have thus reduced to showing 

Theorem 4.1 (Finite complexity generalised H-L conjecture, again). Let s ^ 1, and let 

: if" ^ 1} he a system of affine-linear forms of complexity s. Suppose that the inverse 
Gowers-norm conjecture GI(s) and the Mobius and nilsequences conjecture MN(s) are 
true. Let N > 1 and suppose that ||^E'||iv = 0(1). Let K C [— A^, A^]* he a convex body 
such that ^i,...,tpt> N^^^° on K. Then (jUI]) holds. 
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Ngrmal fgrm reductign of THE MAIN THEOREM. We now reduce Theorem Wl] 
further by placing the system in a convenient "normal form" . We denote the standard 
basis of Z*^ by Ci, . . . , Cd- 

Definition 4.2 (Normal form). Let \1/ = {ipi, ■ ■ ■ , ipt) be a system of affine-linear forms 
on Z'^, and let s ^ 0. We say that \1/ is in s-normal form if for every i G [t], there 
exists a collection Jj C {ei, . . . , Cd} of basis vectors of cardinality | Jj| ^ s + 1 such that 
rieej i^i'i^) is non-zero for i' = i and vanishes otherwise. 

If a system is in s-normal form, then we can explicitly see that for each i G [t] the 

1- complexity of the system is at most s. Indeed, we can cover the t — 1 forms {ipj : 
j G by I Jj| classes, where the class associated to a basis vector e G Jj is simply 
the collection of all the forms ipi' for which ipi'{e) = 0; since ipiie) ^ 0, we see that ^/'j 
cannot lie in the affine span of such a class. It is, therefore, necessary that a system 
be of a finite complexity s before admitting an s-normal form. We now investigate the 
converse relationship, beginning with some illustrative examples. 

Example 10. The system of affine-linear forms \l/(ni,n2) := (ni,ni -|- n2,ni + 2n2,ni -|- 
3n2), which counts progressions of length four, has complexity 2 but is not in s-normal 
form for any s. However the system of affine-linear forms 

\l/'(ni, n2, ns, n^) := (^2 + 2n^ + Sn^, -rii + ^3 + 2^4, -2ni - ^2 + ^4, -3ni - 2^2 - 723), 

which also counts progressions of length four, is also of complexity 2 and is now in 

2- normal form. 

Example 11. The system in Example [21 which counts {d — l)-dimensional cubes, has 
complexity d — 2 but is not in s-normal form for any s. However the system 

^'{ni,...,nd-i,n\,...,n^^^) = (^ni+ ^ K) A<z[d-iV 

i£A i^[d-l]\A 

which also counts {d — l)-dimensional cubes, is also of complexity at most d — 2 and is 
now in {d — 2)-normal form. 

Example 12. Let t := ^^i^^tll^ and consider the system of affine-linear forms 

^'(ni, . . . , nrf) := (n^ + Uj + l)i^i^j^d 

from Example |H This system has complexity 1 but is not in s-normal form for any s. 
However, if we increase the number of parameters from d to 2d, and consider the system 

2d 

^'(rii, . . . , rid, Ud+i, . . . , n2d) ■= [rii + rij + 1 + Ud+i + Ud+j - ^ ^fc)i^i<;j<;d' 

k=d+l 

which count the same type of pattern, then this system still has complexity 1 and is 
now in 1-normal form. Indeed for the off-diagonal forms i < j we may use the basis 
vectors Cj, Cj, while for the diagonal forms i = j we may use the basis vectors Cj, Cd+i- 

Remark. Informally speaking, if [tpi, . . . ,ipt) is in s-normal form, then for each form 
ipi there exist a set of at most s + 1 variables {nj)j^j., such that ipi is the only form 
which truly utilises all the variables at once. As we shall see later, this property will 
be convenient for establishing a "generalised von Neumann theorem" (Proposition 17. ip . 
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which roughly speaking controls averages such as fl4.ip in terms of Gowers uniformity 
norms, which we shall recall in Appendix [Bl 

Now we investigate the converse question, namely whether every system of complexity 
s has a normal form representation. To formalise this we first need the concept of 
extending a system of affine-linear forms by adding some "dummy" parameters: 

Definition 4.3 (Extensions). Let \I' : Z"^ — > Z* be a system of affine-linear forms. An 
extension of this system is a system : Z'^ Z* with d' ^ d, such that 

$'(Z'^') = ^(Z"^) (4.2) 

and furthermore if we identify Z'^ with the subset Z*^ x {0}'^'^'^ of Z*^' in the obvious 
manner, then is the restriction of \E'' to Z'^. 

We note that if \l/ is in s-normal form at i, and if \E'' is an extension of \E', then \E'' is 
also in s-normal form at i. By the same token, we note also that if \E' = {ip.i)f^^ is in 
s-normal form, then so is any subsystem J C {1, . . . , d}. 

Example 13. In Example H|/Example [T2l is an extension of \E'. This is not quite the 
case in Examples [TUl [HI because is not a restriction of \E''. However in these two 
examples, the direct sum \& © \E'' of the two systems is both an extension of and in 
normal form; for instance, in Example [TU] the system 

^ © \Ef'(ni, n2, n'l, n'a, n'^, n'^) := {ni + + 2n^ + Sn'^^, Ui + n2 - n[ + ^3 + 2n'^, 

Ui + 2^2 — ^n'^ — n'2 + Ui + 3?7.2 — Sri'i — 2^2 — n'^ 

is an extension of \1' which is in 2-normal form. 

Lemma 4.4 (Existence of normal forms). Let \E' : Z'^ — > Z* be a system of affine-linear 
forms of some finite complexity s. Then there exists an extension : 7/- — >• Z* of ^ 
which is in s-normal form, where d' = 0(1). Furthermore if the original system ^ had 
size II^IIat = 0(1), then the same is true of the extended system \E''. 

Proof. Let us fix i G [t]. We shall obtain an extension \E'' : Z'^' ^ Z* of \E' which in 
s-normal form at i, by which we mean that there is a collection Jj C {ei, . . . , e^/} of 
basis vectors of cardinality | Jj| ^ s -|- 1 such that IleeJ "^i'i^) non-zero for i' = i and 
vanishes otherwise. Applying this extension procedure once for each value of i we shall 
obtain the result. 

By hypothesis, has i-complexity at most s, and so we can cover by s-|- 1 classes 

Ai, . . . , As+i, such that ipi is not in the affine-linear span of {ipj : j E A^} ioi k & [-5 + 1]- 
In particular, this implies that one can find vectors /i, . . . , fs+i G Q"^ which "witness 
this fact" , that is to say such that ipjifk) = and ipi{fk) 7^ all G [s + 1] and j G Ak. 
By clearing denominators we can take /i, . . . , Z^+i G Z"'. Since \I' has bounded integer 
coefficients we also see that /i, . . . , fs+i = 0(1). If we now let ci' := d -|- s -|- 1 and let 
^' : Z'^' Z* be the system 

^'(n, mi, ... , rus+i) := ^{n + mi/i + . . . m^+i/s+i) 
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for all n ^ 7/ and mi, . . . ,ms+i G Z, we easily verify that satisfies the desired s- 
normal form property at i, as well as the size bounds on \1''. By repeating this procedure 
once for each i we obtain the claim. □ 

Using this lemma it is not hard to show that, in order to prove the Main Theorem, it 
suffices to prove the following result for s-independent systems. 

Theorem 4.5 (Primes in affine lattices in normal form). Let s ^ 1, and let : Z'^ ^ Z* 

he a system of ajfine-linear forms of complexity s in s -normal form. Suppose that 
the inverse Gowers-norm conjecture GI(s) and the Mobius and nilsequences conjecture 
MN(s) are true. Let N > 1 and suppose that W'^Wn = 0(1). Let K C [-A^,A^]* he a 
convex body such that -ipi, . . . ,tjjt > iV^/^° on K. Then fl4.ip holds, that is to say 

Proof of the Main Theorem assuming Theorem \4-5[ By our earlier reduction it suffices 
to show that Theorem 14. II holds. Let K, N be as in Theorem 14. 1[ We may assume 
N large as the claim is trivial for N small. 

Let : Tj'^' Z* be the s-normal form extension given by Lemma 14. 4[ An inspection 
of the proof of that lemma allows us to find vectors fd+i, ■ ■ ■ , fd' & Z'^ of magnitude 
0(1) such that 

^''(n, rud+i, rud') := ^(n + rud+ifd+i + • • • + md'fd')- 

(One can also deduce the existence of these vectors directly from the conclusions of 
Lemma 14.41 ) We observe that the local factors (3'^ associated to the system are 
precisely the same as the local factors Pp associated to this is ultimately due to the 
translation-invariance of Zp. Now let K' C be the convex body 

K' := {(n, mrf+i, . . . , m^O G M'^ x [-A^, Nf-"^ : n + ma+ifd+i + ■ ■ ■ + m^./d, G K}. 

This is contained in [— A^', A^']"'' for some A^' = 0{N). Applying Theorem 14.51 we con- 
clude 

Making the change of variables r := n + md+ifd+i + . . . + rnd'fd', the left-hand side can 
be simplified to 

reKnzd ie[t] p 
and (14. ip follows upon dividing out by (2A^ + 1)'^'"'^. □ 



This completes our linear algebra manipulations. It now remains to prove Theorem 14. 5^ 
a task which will occupy the remainder of the paper. 
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5. The W-trick 



In the preceding section we were able to eliminate the archimedean factor P^o by as- 
suming that ipi, . . . ,ipt were non-negative on K, and using the formulation (14.11) . Now 
we use a somewhat similar trick, which we term the 'W-trick" . This was a vital trick in 
[221 EU [25], where it was used in similar fashion to eliminate the local factors (3p. Once 
again, the reductions here will not actually require any knowledge of the two conjectures 
GI(s) and MN(s), which we shall finally introduce in §8] 

Important convention. From now on in the paper, fix some slowly growing function 
w = w{N). Any function such that w{N) ^ | log log and limTv^oo w;(A^) = oo would 
suffice; for sake of definiteness we shall conservatively set w := log log log A^. The exact 
choice of w is only relevant for determining the decay rate of the o() terms, but as our 
final decay bounds are ineffective we will not attempt to optimise in w. 

We define the quantity W = W{w) by 

W:= Hp; 

since w ^ | log log A^ we have W = 0(log^^^ A^). For each b G [W] with gcd(6, W) = 1, 
let Ah^w : Z+ M"*" be the function 

AtMr^):=^^A{Wn + b) (5.1) 

where we recall that (p(W) = #{6 G [W] : gcd(6, W) = 1} is the Euler totient function 
of W. Thus for instance the prime number theorem in APs ([USD assert^ that Ab^vy(^) 
has average value 1 as n ^ oo. Actually it will be slightly more convenient to work 
with the variant 

KwH ■.= ^A\Wn + b) 

where A' is the restriction of A to the primes, i.e. A'{p) = logp for all primes p and 
A'{n) = for non-prime p. Thus A' only differs from A on the (negligible) set of prime 
powers p'^,p^, . . .. 

Recall that we reduced the task of proving the Main Theorem to that of proving Theorem 
14.51 We now make a further reduction, showing that it suffices to prove the following. 

Theorem 5.1 (W-tricked primes in affine lattices). Let s ^ 1, and suppose that \& = 
{ipi, . . . jipt) : 11^ ^ is a system of affine-linear forms in s-normal form and with 
\\^\\N = 0{1). Suppose that the inverse Gowers-norm conjecture GI(s) and the Mobius 
and nilsequences conjecture MN(s) are true. Let K C [— A^, A^]* he any convex body on 
which ^i, . . . ,ipt > N'^^^^ ■ Then for any bi, . . . ,bt G [W] which are coprime to W , we 
have 

E {\{KA^^{n))-l)=o{N\ 



^In order to obtain this statement for w as large as ^ log log A^, one needs a more quantitative version 



of (|1.5p such as the Siegel-Walfisz theorem. 
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Remark. Note that the bounds on the right do not depend on 61, . . . , 6^. The philosophy 
here is that the functions should behave "pseudorandomly" with average value 

one; this is in contrast with A, which has many local irregularities with respect to small 
moduli which necessitate the introduction of the local factors (3p. This philosophy of 
passing from A to the more uniformly distributed A^ underlies the arguments in [24] . 
In ^ |T2l we will have to invert the VT-trick and deduce some correlation estimates on A[ ^ 
from that on A. 



Proof of the Main Theorem assuming Theorem I5.il By previous reductions, it suffices 
to establish Theorem 14.51 Let \E', K be as in Theorem 14.51 We may then replace A by A' 
as the contribution of the prime powers is easily seen to be negligible. To prove (14. ip . 
it then suffices by (11.31) to show that 

^ l[A'{Un))=YoUK)l[f], + o{N'). (5.2) 
We may take to be large, since the claim is trivial otherwise. 

Now the upper bound on w ensures that W ^ log N. From Lemma 11.31 followed by the 
multiplicativity of the local factors (3 we have 

l[f3p=l[f3p + o{l)=(3w + o{iy, 

since Yold{K) = 0{N'^), we conclude that 

YoUK)l[(3p = Yo\4K)(3w + o{N''). 
p 

Now let A be the set 

A:={ae [WY : gcd(V'i(a), W) = 1 for all i G [t]}. 

Then from (11.61) we have Pw = ^^^^j 1^1 /W^'^; which implies that 

Yo\4K)l[(3, = J2 (^y^''^MK) + o{N'). (5.3) 
Also, from Lemma [1.31 we know that is comparable to 1, and so 



\A\ « [^^j • (5.4) 
Next, note that by a simple expansion we have 

E n^'(^^W)= E E U^'mWn + a)). (5.5) 

Wn+aeK 

If a does not lie in A, then ipi{Wn + a) will not be coprime to W for some i G [t]. 
Since %jj{Wn + a) > N'^/^^ by hypothesis, and W is so small compared to A^, we see that 
A'{ipi{Wn + a)) = 0. Thus we may restrict a to A. Now for each a G A and i G [t], we 
can write 

iJiiWn + a)= WtPi^ain) + bi{a) 
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where bi{a) lies in [W] and is coprime to W, while ipi^a is a translate of ipi whose constant 
term ipi.ai^) is 0{N/W). Indeed bi{a) is simply the remainder formed when dividing 
ipi{a) by W. We then have 

W 

A'i^,{Wn + a)) = ^^A;^(^)^^(^,^,(n)). 
It follows from (15.50 that 

E n^'(^^H) = E(J^)* E UKia,,wi^Un))- (5.6) 

However from Theorem 15.11 (with replaced by A^ = 0{N/W) and K := {K — a)/W: 
note that = 0(1)) we have 

d 



Recalling (15. 4p . this together with (15.61) implies that 

VKn+aG-R" 

On the other hand a simple volume-packing argument (cf. Appendix El) yields 



nGZ' 
Wn+aGK 



and so, using (15. 4p once more together with (15. 7p . we see that 

nG/i'nZ'* jG[i] aGA ^ 

Subtracting this against (15.30 we see that the left-hand side of (15. 2p is o{N'^). This 
proves the claim. □ 



Theorem 15.11 as we have just seen, implies the Main Theorem. Before moving on to 
the more substantial arguments in this paper, we give one further simple reduction, 
deducing Theorem 15.11 from the following variant. 

Theorem 5.2 (Final technical reduction). Let s ^ 1, and let \E' = {ipi, . . . ^ipt) '■ 
7} he a system of affine-linear forms in s-normal form. Suppose that the inverse Gowers- 
norm conjecture GI(s) and the Mobius and nilsequences conjecture MN(s) are true. 
Let K C [— A^, A^]* be any convex body on which ipi,...,ipt > N'^^'^^. Then for any 
bi, . . . ,bt G [W] which are coprime to W , we have 

E n(^LvK(^.H) - 1) = o{N'^). 

nG-ftTnZ'* iG[t] 
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Indeed, Theorem 15.11 follows immediately from Theorem 15.21 by splitting each A'^, ^ as 
(A^, ^ — 1) + 1, expanding out the product in Theorem 15.11 and using Theorem 15.21 
repeatedly, noting that any subsystem of will still be in s-normal form. □ 

The remainder of the paper shall be devoted to establishing Theorem 15.21 

6. The envelgping sieve 

In previous sections we have reduced matters to establishing a certain discorrelation 
estimate. Theorem 15.21 for the functions A'^,^ — 1. A major difficulty in the analysis 
here is that these functions are not bounded uniformly in N. However, as in [211 ES] , we 
shall be able to import tools from sieve theory. In particular, we use the principle of the 
"enveloping sieve" . This is a well-behaved function z/, some constant multiple of which 
provides a pointwise bound for the functions AJ,. ^r — 1. Of course, the function u will 
not be bounded as — > cxd; however it does obey a number of very good correlation or 
pseudorandomness estimates which assert, roughly speaking, that v "effectively behaves 
like" the bounded function 1. 

To define the notion of pseudorandomness properly we recall the linear forms condition 
and correlation condition from [2l], modified slightly for the application at hand. In 
the following three definitions we assume that A^ is a large positive integer, and that 
N' = N'{N) is a prime number of size A^ < A^' ^ Os,t,d,L{N). 

Definition 6.1 (Measures). A measure on Z^v' is a function u : Zj^fi M"*" (depending 
of course on A^' and hence on A^) with 

E„ez^,K^) = 1 + 0(1). (6.1) 

Definition 6.2 (Linear forms condition). Let uhe a measure on Z^v', and let tuq, d^ and 
Lq be positive integer parameters. Then we say that z/ satisfies the (mg, do, Lo)-linear 
forms condition if the following holds: given 1 ^ d ^ do, 1 ^ t ^ tuq, and any finite 
complexity system = {ipi, . . . ,ipt) of affine- linear forms on Z"^ with all coefficients of 
\I' bounded in magnitude by Lq, we have 

(1). (6.2) 

ie[t] 

In this expression we induce the affine-linear forms ipj : Z^, — Z^v' from their global 
counterparts tpj : Z^ ^ Z in the obvious manner. 

Remarks. Note that (16. 2p includes (16. ip as a special case. Strictly speaking, it would be 
more accurate to call measures "probability densities", and the linear forms condition 
is really an "affine-linear forms condition", but we will keep the notation as above for 
brevity and compatibility with [2^. In pi] the coefficients of the affine-linear forms 
were allowed to be rational with bounded numerator and denominator. Since A^' is a 
large prime, it is always possible in practice to clear denominators and deal only with 
forms having integer coefficients. Note that Theorem 15.11 is a (conditional) assertion 
that the A{, essentially obey the linear-forms condition. Thus trying to establishing 
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the linear forms condition for hb,w would essentially be as hard as trying to prove the 
Main Theorem. The point of the definition, however, is that it will suffice to achieve 
the much simpler task of majorising A^^w by constant multiples of measures u which 
obey this condition. Finally, we note that the error term in (16.21) is uniform over all 
choices of constant term ^'(O). 

Definition 6.3 (Correlation condition). Let u : Z^v' —>■ K"*" be a measure, and let rriQ 
be a positive integer parameter. We say that u satisfies the tjiq- correlation condition if 
for every 1 < m ^ mo there exists a weight function t = '■ "^n' which obeys 

the moment conditions 

E„g^^,r'?(n) 1 (6.3) 

for all 1 ^ g < cxD and such that 

for all hi, . . . ,hm € Z^r/, not necessarily distinct. 

Remarks. Because we are only seeking upper bounds here rather than asymptotics, this 
condition would follow from a standard upper bound sieve such as Selberg's sieve. One 
should compare this condition with the much more difficult prime tuples conjecture, 
which is part of the "infinite complexity" case c? = 1, t > 1 of the generalised Hardy- 
Littlewood conjecture. The correlation condition will only be used implicitly in this 
paper, as it is needed in the proof of [211 Proposition 8.1], which is in turn used in the 
proof of Proposition 110.31 

Let D he a. positive integer. We call a measure D -pseudorandom if it obeys the 
[D, D, D)-linear forms and D-correlation conditions. In practice, we shall work with 
measures which are D-pseudorandom where D is a sufficiently large function of s, d, t, L. 
The exact value will not be terribly important for our arguments and, whilst it could 
be specified explicitly, we shall not do so. 

Our next task is to show that the functions A^^^y, . . . ,A'fj^^w '^^^ be dominated by a 
D-pseudorandom measure for any fixed D that we choose, providing we are willing to 
concede multiplicative constants that depend on D. 

Proposition 6.4 (Domination by a pseudorandom measure). Let D > 1 be arbitrary. 
Then there is a constant Cq := Cq{D) such that the following is true. Let C ^ Cq, 
and suppose that N' e [CN, 2CN]. Let hi,. . . ,bt G {0,1, . . . ,W — 1} be coprime to 
W := rip^uiP- Then there exists a D -pseudorandom measure v : — > which 
obeys the pointwise bounds 

for alln G [A^^^^, A^], where we identify n with an element ofL^' in, the obvious manner. 

The proof of this proposition is a minor variant of that in [21]. For the sake of com- 
pleteness we present a proof in Appendix [Dl The constant C is a technicality needed 
to avoid certain "wraparound" issues when passing from [A^] to Zjv/ and can be largely 
ignored. 



26 



BEN GREEN AND TERENCE TAG 



The philosophy of the transference principle developed in [2?] is that functions which 
are dominated by pseudorandom measures behave almost as if they were bounded, for 
the purposes of computing correlations and other multilinear averages. We shall see 
examples of this in later sections. For now, we turn to the first significant step in 
the paper, namely the reduction of matters to establishing a Gowers uniformity norm 
estimate for A[^r — I- 

7. Reduction tg a Ggwers ngrm estimate 

We shall informally refer to a function / : [A^] — C as being Gowers uniform of order 
s if its Gowers uniformity norm ||/||t/s+i[Ar] is small; see Appendix [B] for definitions and 
basic properties of this norm. A basic principle is that Gowers uniform functions of 
order s have a negligible impact on multilinear averages of complexity s or less. An 
example of this is [Ml Proposition 5.3], but we will prove a much more general result of 
this type here. We refer to such statements as generalised von Neumann theorems. The 
name originally came from results in ergodic theory such as [32l Theorem 11.1], but it 
has been convenient to use the name to describe a large number of contexts in additive 
combinatorics in which some kind of expression is bounded using Gowers norm^. 

A crucial observation in ^24j is that this type of principle also applies to unbounded 
functions, so long as these unbounded functions are in turn dominated pointwise by a 
suitably pseudorandom measure. 

Proposition 7.1 (Generalised von Neumann theorem). Let s,t,d,L be positive integer 
parameters as usual. Then there are constants Ci and D, depending on s,t,d and L, 
such that the following is true. Let Ci ^ C ^ Os,t,d,L{^) be arbitrary and suppose that 
N' e [CA^, 2CA^] is a prime. Let v : Zat/ — R"*" be a D -pseudorandom measure, and 
suppose that fi, . . . , ft : [A^] M are functions with \fi{x)\ ^ z/(x) for all i G [t] and 
X G [A^]. Suppose that \E' = {ipi, . . . ,ipt) is a system of affine-linear forms in s-normal 
form with ^ L. Let K C [-A^,A^]'* be a convex body such that \E'(i^) C [A^]*. 

Suppose also that 

min ||/j||{/s+i[7v] ^ ^ 

for some 5 > 0. Then we have 

Yl n /^(^^(^)) = «^.^(^') + ^cm'^. (7.1) 

Remarks. For an explanation of the /t-notation, we refer the reader to ^ One could 
specify explicit values for Ci,D, but we have not done so. In applications to the primes 
we will always take C ^ Cq{D), where Cq is the function defined in Proposition 16.41 

This proposition is a variant of piij Proposition 5.3]. It is somewhat more elaborate 
than that result in that it applies to a general system of affine linear forms, and one 
has the flexibility of summing over an arbitrary convex body. Once the convex body 
is handled by standard techniques, however, the only real tool that is needed is several 



^Another example of this is the Koopman von Neumann theorem, which we will introduce in t ^lOl 
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applications of the Cauchy-Schwarz inequality. This is a common feature of generalised 
von Neumann theorems. We give a proof of Proposition 17.11 in Appendix O which 
uses some preliminaries in Appendices El [B] but is otherwise self-contained. Using 
Propositions 16.41 and 17.11 we reduce Theorem 15. 2[ and hence the Main Theorem, to the 
following Gowers uniformity estimate. 

Theorem 7.2 (Gowers uniformity estimate). Let N,w > 1, and let b G [W] be coprime 
toW = Y[p<wP- Suppose that the inverse Gowers-norm conjecture GI(s) and the Mobius 
and nilsequences conjecture MN(s) are true for some s ^ 1. Then we have 

\\K,W ~ l|l(7=+i[Af] = o(l). 

Remark. Observe (cf. Examples [2] and fTTi) that this theorem is a special case of Theorem 
15.21 Thus the generalised von Neumann theorem, Proposition 17.11 can be viewed as an 
assertion that the U^~^^ average is "universal" or "characteristic" among all multilinear 
averages of complexity s, even when dealing with functions that are bounded only by a 
pseudorandom measure. 



Proof of Main Theorem assuming Theorem 1.2. By previous reductions, is suffices to 
prove Theorem 15. 2[ Let the notation and assumptions be as in that theorem. By 
enlarging by a multiplicative factor of 0(1) if necessary we may assume that '^{K^ C 
[A^]*. Let D = Ds^t,d,L be the constant in Proposition 17. 11 and set C := max(Ci, Co(-D)), 
where Cq is the function appearing in Proposition 16.41 and Ci is the one appearing in 
Proposition 17.11 Applying Bertrand's postulate, we may select a prime A^' such that 
CN ^ N' ^ 2CN. Let u be the D-pseudorandom measure given by (16.41) . Then the 
functions fi{n) := c ■ (AJ,^ ^ — 1) will be pointwise dominated in magnitude by u for 
some suitably small constant c = Cs^t,d,L > 0. Applying Theorem 17.21 and Proposition 
I7.lt we obtain the desired estimate after dividing out the factors of c. □ 



We have now completed yet another reduction, and it remains to prove Theorem 17.21 
Note that we have eliminated the system \l/ of affine-linear forms, as well as the convex 
body K, replacing them both with the Gowers norm U^~^^ [N] ; the parameters d, t have 
also disappeared. In order to proceed further, we need to exploit some deeper facts and 
conjectures concerning the Gowers norm. In particular we shall shortly need the inverse 
Gowers-norm conjecture GI(s), to which we now turn. 



8. The inverse Gowers-norm and Mobius and nilsequences conjectures 



Nilsequences. The purpose of this section is to state the two conjectures GI(s) and 
MN(s) which have appeared in many of the above theorems, most recently in Theorem 
17.21 Both conjectures revolve around the concept of a nilsequence, which we now pause 
to recall. 

Definition 8.1 (Nilmanifolds and nilsequences). Let G be a connected, simply con- 
nected. Lie group. We define the central series Gq ^ Gi ^ G2 ^ . . . by defining 
Go = Gi = G, and Gj+i = [G, Gi] for i ^ 2, where the commutator group [G, Gi] is 
the group generated by {ghg~^h'^ : g E G,h E Gi}. We say that G is s-step mlpotent 
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if Gs+i = 1. Let r C G be a discrete, cocompact subgroup. Then the quotient G/T 
is called an s-step nilmanifold. If (7 G G then g acts on G/T by left multiplication, 
X ^ g ■ X. By a an s-step nilsequence, we mean a sequence of the form {F{g"'x))nmj 
where x G G/F is a point and F : G/F ^ M is a continuous function. We say that the 
nilsequence is 1-bounded if F takes values in [—1, 1]. 

Remark. For a full technical treatment of nilsequences, see [H]. The reader might consult 
[HI [32l [35] for the ergodic theory perspective, or other papers of the authors [23l [26l [27] 
for various discussions more-or-less in the spirit of additive combinatorics. 

As remarked above, the exact definition of a nilsequence will not be terribly important to 
our arguments here. In the s = 2 case, representative examples of nilsequences are those 
associated to the Heisenberg nilmanifold, which is discussed in detail in [HI [231 [2S1 [2Z|- 
See also the proof of Proposition 18.41 

Remark. Note that we are requiring our nilpotent groups to be connected and simply 
connected. The latter hypothesis is not overly restrictive, since if G is connected, 
then it may be assumed to be simply connected by passing to a universal cover. The 
connectedness assumption however is more substantial; the nilpotent groups constructed 
in the ergodic theory literature (e.g. in [32]) are not always shown to be connected. 
However, Sasha Leibman [36] has indicated to us that it suffices, in the context of the 
GI(s) conjecture, to deal with connected G. We will elaborate on this point in a future 
paper if necessary, but the issue does not need to be addressed here. This is because the 
arguments used in proving the cases s ^ 2, which are the only cases of the conjectures 
established so far, give connectedness as a byproduct. 

As we shall need to be rather quantitative regarding these nilmanifolds, we shall arbi- 
trarily endowQ each nilmanifold G/F with a smooth Riemannian metric dc/r- We then 
define the Lipschitz constant of a nilsequence F{g"'x) to be the Lipschitz constant of F. 

Remark. Note that the Lipschitz constant of a nilsequence depends on the choice of 
metric dc/r one places on the nilmanifold; there is no obvious canonical metric to 
assign to any given nilmanifold, and so the Lipschitz constant is a somewhat arbitrary 
quantity. However if one replaces the metric with another smooth Riemannian metric 
then from the compactness of G/F we see that the Lipschitz constant is only affected 
by at most a multiplicative constant. One could replace the Lipschitz constant here by 
other quantitative measures of regularity, such as Holder continuity norms or norms, 
but this will not significantly affect the statements of the conjectures here, basically 
because a function which is controlled in one of these norms can be approximated in a 
quantitative manner as the uniform limit of functions controlled in any other of these 
norms. 



Strictly speaking, we are abusing notation here; a nilmanifold should not be represented solely by 
the quotient space G/F, but rather as a quadruplet (G, F, G/F, rf^/r) (and the Lie group G should 
in turn be expanded to explicitly mention the group operations, coordinate charts, etc.). Simi- 
larly, the nilsequence should not be represented solely as F{g^x), but should really be the octuplet 
{G,T,G/T,dG/r,g,x,F,{n 1-^ F{g"^x))). However we shall continue to abuse notation in order to 
simplify the exposition. 
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Remark. The Lipschitz nilsequences form an algebra in the following sense: if f{n) 
is an s-step nilsequence on G/T with Lipschitz constant M, and f{n) is an s-step 
nilsequence on G/T with Lipschitz constant M, and both nilsequences are bounded by 
0(1), then f{n) ±/(n) or f{n)f{n) is an s-step nilsequence on the product nilmanifold 
(G/T) X (G/T) with Lipschitz constant O^^^(l). However, nilsequences as we have 
defined them are not closed under uniform limits. This leads to a slight conflict between 
the nomenclature of the present paper and that of (for example) [6]. In that paper the 
objects we have called nilsequences are referred to as basic nilsequences; a nilsequence 
is then a uniform limit of basic nilsequences. Since our analysis is essentially finitary in 
nature we will not make any further mention of this distinction. 

The INVERSE Gowers-norm conjecture. An important feature of s-step nilman- 
ifolds is that they have significant "constraints" connecting arithmetic progressions of 
length s + 2, or cubes of dimension s + 1. Roughly speaking, given the first s + 1 elements 
X, g ■ X, ■ X, . . . , g'^ ■ X of a progression in an s-step nilmanifold G/T, the next element 
g^~^^ ■ X of the progression and all further elements are essentially completely determined 
as continuous functions of these first s + 1 elements. For a precise formulation of this 
assertion see [26l Lemma 12.7]. Similarly, when considering an s-dimensional "cube" 
{^f^^ . . . g^" ■ X : {ui, . . . , ujs) G {0, 1}*} in G/T, the final vertex gi . . . gg ■ x of this cube is 
essentially a continuous function of the other 2^*"^ elements of this cube. See Appendix 
[E]for more precise formulations of this statement, which we will make heavy use of in 
this paper. As a consequence of either of these facts, we can relate nilsequences to the 
jjs+i YiQj-iYi, The next result is in this direction, but it is not sufficiently general for our 
later applications. We state it now to introduce the concept of nilsequences obstructing 
uniformity, and because it can be proved using earlier results. 

Proposition 8.2 (Nilsequences obstruct uniformity). Let s ^ 1 be an integer and let 
6 G (0,1) be real. Let G/T = {G/T,dG/r) be an s-step nilmanifold with some fixed 
smooth metric dc/v, o^nd let {F{g"-x))nen be a bounded s-step nilsequence with Lipschitz 
constant at most M. Let f : [A^] [—1, 1] be a function for which 

E„e[7V]/(ri)F((7"-x) ^5. 

Then we have 

\\f\\u=+^N] >s,<5,A/,G/r 1- 

Proof. See P"6. Prop. 12.6]. The lower bound arising in that proposition was stated to 
depend on the continuous function F : G/T C, and not just on ||-F||Lip. However, an 
examination of the proof reveals that the argument can be made uniform in F, for a 
given value of ||F||Lip. □ 

Remark. It turns out that one can relax the assumption that / be uniformly bounded, 
requiring only that / be bounded in L^ norm; see Corollary 111.61 

The inverse Gowers-norm conjecture is an assertion in the converse direction, that nilse- 
quences are the only obstruction to uniformity. More precisely, we have for each s ^ 1 
the following conjecture: 



30 



BEN GREEN AND TERENCE TAG 



Conjecture 8.3 (GI(s) conjecture). Suppose that < 6 ^ 1. Then there exists a finite 
collection A4s,s of s- step nilmanifolds G/T = {G/T^dG/v) with the following property. 
Given any N and any f : [A^] [—1, 1] such that 

\\f\\u^+^[N] ^ 

there is a nilmanifold G/V E M.s,5 o-nd a 1-bounded s-step nilsequence {F{g"-x))neN on 
it with Lipschitz constant Og^si^), such that 

|E„e[iv]/(n)F(^"x)| >,,5 1. 



This conjecture in this form is due to the authors. It was hinted at in [261 §13] and is 
being stated formally for the first time here. The evidence in favour of it is strong. First 
of all we know that the cases s = 1, 2 are true. The case s = 1 is an exercise in harmonic 
analysis. Indeed in this case one can take G/T to just be the standard unit circle M/Z, 
so that A^i,5 is a singleton set independent of 6. The case s = 2 was established, with 
some effort, in [26] and is stated in Proposition 18.41 below. Note that things are not so 
simple when s > 1, and it is known that as S decreases to zero, the collection A4s,s of 
nilmanifolds G/T that one must employ must have cardinality going to infinit}|l. 

Proposition 8.4 (The GI(2) conjecture, [26j). The GI(2) conjecture holds in the form 
stated above. In fact the group G may be taken to be a product of 0{5~^'^^^) Heisenberg 
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groups ( 1 R ) , and the discrete cocompact subgroup T may be taken to be a product of 



r / 1 z z\ 
copies 0/ I 1 z ) , 



Proof. This is almost [26i Thm. 12.8]. In that theorem, a nilsequence was constructed in 
a somewhat ad hoc manner from another type of object, a generalised quadratic phase. 
In the argument of that paper, however, the nilpotent groups constructed were not all 
Heisenberg groups. Some of them were isomorphic to M? x Z, which is not connected 
and hence, with our definition, cannot be used to construct a nilmanifold. 

More precisely, in the proof of [2S1 Thm. 12.8] it is shown that if ||/]|c/3 ^ 5 then 

\¥.n^^M]f{n)F^{g^x)e{n^e)\ > exp(-r^«), 

where Fi{g^x) is a product of nilsequences coming from 0(5~*^''^'*) Heisenberg groups 
(which are all connected and simply-connected), 9 G M/Z, and e(x) := e^'^*^. In [261 
Thm. 12.8] we proceeded by constructing e{n'^9) as a nilsequence coming from a skew 

/* 1 M IR. \ 

torus which, being a quotient of the disconnected nilpotent Lie group ( o i z j , is not 

immediately helpful in the present context. However we might just as easily have 
observed that 

/ 1 -e fi-ne -n^e\ 

12=01 2n 
VO 1 / Vo 1 / 

/ 1 z z \ 

which, upon quotienting by the right action of ( o J ^ ) > leads to 

r / 1 -6» -e \ " -, r / 1 {~ne} {n^6 
012 = 1 

'-V001/-I 1 



This seems to be related to the fact, known to the ergodic theorists, that the inverse hmit of 1-step 
nilsystems is a 1-step nilsystem, but the same is not true for s-step nilsystems, s ^ 2. 
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Here we have moved our matrix under the right action of T so that it hes in the funda- 
mental domain 

see for further discussion. The fractional parts {t} are chosen to lie in (— |, |]. 

This almost exhibits e(n^^) as a nilsequence coming from the Heisenberg group, but 
there is one small problem: the function 

(oiy) e(z) 
Vo 1/ 

from to C does not extend to a continuous function on G/T, since there are discon- 
tinuities on the boundary dJ-". 

To get around this one may introduce a smooth partition of unity {Xj)j&J on (M/Z)^, 
where each function Xj is supported on (say) a square of width 1/100. Each function 

(oo D ^ Xi(a;,y)e(2;) 

does extend to a Lipschitz function on G/T. This makes it clear that e(?2^^) may, after 
all, be realised as a nilsequence coming from a product of 0(1) Heisenberg groups. □ 

For higher values of s, the conjecture GI(s) remains open. However, significant support 
in favour of this conjecture arises from the combinatorial and Fourier- analytic work 
of Gowers pi], in which a "local" form of this conjecture was established in order 
to provide a new proof of Szemeredi's theorem. Further substantial support for the 
conjecture comes from the ergodic-theoretic work of Host-Kra [32] ■ 

The Mobius and nilsequences conjecture. Our main results are concerned with 
the von Mangoldt function A(n) and with functions derived from A, such as A^^^. It 
turns out, however, to be convenient to rewrite this function in terms of the closely 
related Mobius function fi : Z { — 1,0,+1}, defined by setting fi{n) := (—1)'^ when 
n is the product of d distinct primes, and fi{n) = otherwise. The main advantage of 
doing so is that yU is a 1-bounded function, whereas A patently is not. As is well known, 
A and /x are related by the identity 

A{n) = 5^/i(rf)log^ = -J2M\ogd (8.1) 

d\n d\n 

for all n ^ 1. In principle this allows us to reduce the task of estimating correlations 
involving A to that of estimating correlations involving fi, although when doing so the 
unbounded weight log ^ and the summation over d will introduce some dangerous factors 
of O(logA^) which must be handled with some caution. 

Suppose we formally apply Conjecture 18.31 to the task of proving Theorem 17.21 ignoring 
for now the significant issue that A[^r — I is not uniformly bounded. Then we expect 
to reduce this theorem to the assertion that AJ, — 1 has small correlation with any 
s-step nilsequence. In the light of ( 18. ip . we expect this statement to be related to the 
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corresponding assertion for the Mobius function /i. We formalise this latter statement 
as the following conjecture. 

Conjecture 8.5 (MN(s) conjecture). Let G/T = (G/F, rfc/r) be an s-step nilmanifold 
with smooth metric dc/v, let {F[g'^x))n(^[N\ be a hounded s-step nilsequence with 
Lipschitz constant M. Then we have the bound 

|E„^^/i(n)F(^'^x)| <A,M,G/r,s log'^iV 
for any real number A > 0. 

Remark. It is important to note that the implied constant is not allowed to depend 
on g and x. The case s = 1 can be reduced to a classical result of Davenport PU]; 
see [271 §6] for details. The case s = 2 was the main result of [2Zj. The case s > 2 
remains open; however, we certainly expect MN(s) to be true in this case because of the 
Mobius randomness heuristic from analytic number theory, which states that /i exhibits 
a substantial degree of orthogonality to any suitably "Lipschitz" function. Moreover, it 
seems likely that the techniques we developed to prove MN(2) will eventually extend to 
cover MN(s), s ^ 3, as well. This is another ongoing area of research. As is well known, 
even when s = 1 the current technology for establishing this conjecture yields ineffective 
implied constants in the <tiA,M,G/r due to our lack of knowledge regarding the existence 
of Siegel zeroes. This ultimately makes the decay rates in the Main Theorem (and 
its corollaries) similarly ineffective. If the GRH is assumed, the estimates do become 
effective. However they are still somewhat poor for s ^ 2, largely because the bounds 
in the GI(2) conjecture obtained in are a little weaker than one might hope for. 

9. CGRRELATION ESTIMATES FGR MOBIUS AND LiGUVILLE 

Perhaps the heart of the present paper is ^ |TOl in which it is shown how, in certain cir- 
cumstances, the requirement of 1-boundedness can be dropped in the GI(s) conjecture. 
This section is an aside to the main line of our argument, in which we use what we al- 
ready have to obtain estimates similar to the generalised Hardy-Littlewood conjecture 
for the Mobius function and the related Liouville function A : N — >■ {—1,+!}, defined 
to be the unique completely multiplicative function such that \{p) = — 1 for all primes 
P- 

Proposition 9.1 (Correlation estimates for /i and A). Let d,t,L be positive integers, 
let N be a large positive integer parameter, and let \E' = {ipi, . . . ,ipt) be a system of 
affine-linear forms with size W'^Wn ^ L and complexity at most s. Assume the GI(s) 
and MN(s) conjectures. Let K C [— iV, A^]"^ be a convex body. Then we have 

\{l^{Un))=Os,tMN'') (9.1) 

and 

J2 llHMn))=Os,tAL{N''). (9.2) 

Remark. Note the lack of any local factors (3p,(3oo- This makes Proposition 19.11 rather 
appealing from a certain point of view. It also provides an instance of the "Mobius 
randomness heuristic" alluded to above. 
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Proof. We begin by applying Proposition 17.11 the generalised von Neumann theorem. 
Since fj, and A are 1-bounded, this may be applied with the pseudorandom measure z/ 
set equal to the constant function 1, which is obviously D-pseudorandom for all D. We 
note that in this case the proof of Proposition 17.11 that we give in Appendix O is rather 
simpler than in the case of a more general u; specifically, one can use Corollary IB. 31 in 
place of Corollary IB. 4| while the verification of ( IC.IOI) . ( IC.lip is trivial when u = 1. 



The application of Proposition 17.11 reduces (19.11) to the statement 

||At||{/»+MAf] = (9.3) 
Applying the GI(s) conjecture, it is sufficient to establish that 

E„^Nfi{n)F{g''x) = o,,m,5(1) (9.4) 

unifromly over all G/T G J^s,s and all 1-bounded M-Lipschitz nilsequences {F{g"-x))ni^N 
on G/T. Indeed the truth of such a statement implies, by the GI(s) conjecture, that 
||/u||i/=+i[A'] ^ then take 6 arbitrarily small to deduce (19. 3p . Recalling 

that = Os,s{^), we see that (19. 4p follows immediately from (a weak form of) the 

MN(s) conjecture. This proves (19. ip . 

The proof of (19. 2p proceeds similarly. It suffices to establish the analogue of (19.40 . that 
is to say the bound 

E„^^A(n)F((7"x) = o,,M,5(l) (9.5) 
uniformly over all G/T G Ais,8 and all 1-bounded M-Lipschitz nilsequences {F{g"'x))n^N 
on G/T. We begin by noting the identity 

X{n) :=$^M^)- 

This implies that for any positive real X, any fixed G/T G Ais,s and any 1-bounded 
M-Lipschitz nilsequence {F{g'^x))n^N on G/T we have 



E,^A,A(n)F((7"x) = E„^^^/^(-)F((7^'x) 



d?\n 

= 5^E„,^;vld2|„/i(^)F(^?"x) + 5^E„^^ld2|„^(^)F(f7"x) 

d^X d>X 

= E,^^/,2/i(fc)F(/'^a;) + 0{X-^). (9.6) 

By replacing g by g'^^ in the MN(s) conjecture we obtain the bound 

Ek^N/cPl^{.k)F{g'^''^x) = Og/tmA'^)- 

Substituting into (19.60 we obtain 

E„^^A(n)F((?"x) = OG/r,M,x(l) + 0{X-'). 

Let e > be arbitrary. Taking X := 1/e, we may make this expression smaller than a 
constant times e by taking sufficiently large. This implies that 

E„^^A(n)F((7"x) = OG/r,M(l). 
Recalling once more that |A^s,5| = Os,s{l), we therefore obtain (19.50 and hence (19.20 . □ 



34 BEN GREEN AND TERENCE TAG 

Let us remark that, as with the Main Theorem, Proposition 19. II is unconditional in the 
cases s = 1,2. 

We conclude with a mention of a conjecture of Chowla [8], which asserts that A is 
uniformly distributed on any polynomial, thus for instance 

E,„,.<^A(P(yi,y2)) = op(l) (9.7) 

for any polynomial P : N x N ^ N of two variables. Our results imply (for instance) 
the following case of Chowla's conjecture. 

Proposition 9.2. Let P : N x N ^ N be a polynomial of degree at most 4 which is the 
product of homogeneous linear factors over Q, and which is not a rational multiple of a 
perfect square. Then we have 

Es/i,y2^ivA(P(?/i,y2)) = Op(l). 

The proof is immediate from (19. 2p and the complete multiplicativity of A; note that 
we can easily eliminate any repeated factors in P and so the system of linear forms 
associated to P will be non-degenerate. We remark that this conjecture was also re- 
cently verified for all homogeneous polynomials of degree at most three in [29l [30] . 
Removing the homogeneity assumption looks hopeless with current technology; the 
case P{yi, 1/2) = yiivi + 2) is already roughly of the same order of difficulty as the twin 
prime conjecture. 

10. Transferring the inverse Gowers-norm conjecture 

Recall that we are trying to use the inverse Gowers-norm and Mobius and nilsequences 
conjectures to prove Theorem 17. 2[ We cannot apply the Gowers Inverse conjecture 
directly to prove Theorem 17.21 because A'^^^r — 1 is not bounded uniformly in A^. The 
difficulty here is similar to that encountered in [21], in which Szemeredi's theorem, which 
ostensibly only establishes multiple recurrence bounds for bounded functions, needed 
to be extended to an unbounded function such as A[ ^. We will use a similar resolution 
to that in plj, namely to transfer the inverse Gowers-norm conjecture to the situation 
of a function bounded by a pseudorandom measure. More precisely, the purpose of this 
section is to prove the following result. 

Proposition 10.1 (Relative inverse Gowers-norm conjecture). Assume the GI(s) con- 
jecture. For any < 5 ^ 1 and any C ^ 20, there exists a finite collection A4s,s,c 
of nilmanifolds G /V = {G/V.dG/v) with the following property. Let N ^ 1, suppose 
that N' G [CA^, 2CA^] is a prime, that v : Zjv' M"*" is an (s + 2)2^^^ -pseudorandom 
measure, that f : [A^] is a function with |/(?^)| ^ v{n) for all n G [A^] and 

that \\f\\u^+^[N] ^ 5- Then there exists G /V G M.s,&,c together with a 1-bounded s-step 
nilsequence {F{g"'x))n£Z with Lipschitz constant Os^sfi'i^), such that 

|E„^^/(n)P((7"x)| >,,c,5 1. 

Remarks. This looks significantly more complicated than the ordinary GI(s) conjecture, 
but this is something of an illusion. Most of the complexity comes from the need for 
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the additional dependence on C. A largeish value of C might be required in order to 
construct an appropriate pseudorandom measure on Z^v' (cf. Proposition 16. 4p and so 
we leave C unspecified in this proposition. 

In view of Proposition 110.11 and Proposition 16.41 it is not hard to see that Theorem 17.21 
and hence the Main Theorem, follows from the next proposition. All one need do is 
choose C := max(Co((s + 2)2*+^), 20), where Co is the function appearing in Proposition 
16.41 This ensures that an appropriate pseudorandom measure v can be constructed. 

Proposition 10.2 (W-tricked von Mangoldt orthogonal to nilsequences) . Let s ^ 1, 
and assume the MN(s) conjecture. Let G/T = {G/r,dG/r) be an s-step nilmanifold 
with smooth metric dc/r, md let {F{g"'x))n<^[N] be a bounded s-step nilsequence with 
Lipschitz constant M. Let b G [W] be coprime to W . Then we have the bound 

E„,e[Ar](A;,_^^(n) - l)F((7"x) = Om^g/fA'^)- 

Remark. In principle. Proposition 110.21 is substantially easier to establish than the pre- 
ceding reductions of the Main Theorem, such as Theorem 17.21 This is because we are 
now computing the correlation of A (or Aj,,^ — 1) with respect to a "low complexity" 
sequence F{g'^x), rather than the more complicated task of computing a multilinear cor- 
relation of A with itself. In particular one can now hope to use tools such as Vinogradov's 
method to establish this proposition. Indeed, the computation of exponential sums such 
as Ylin&[N] A(^)e(an), or more generally Ylin&[N] A(n)e(an^), are essentially model cases 
of Proposition 110.21 and are well-known to be treatable by Vinogradov's method. How- 
ever, Proposition 110.21 is somewhat more general as it also (for example) asserts some 
control on generahsed polynomial exponential sums such as A(n)e(a?2[/9nJ ), 

where [-J is the greatest integer function. See for further discussion of the link 
between such functions and 2-step nilsequences. Thus we see that the inverse Gowers- 
norm conjecture GI(s) is a powerful tool for establishing bounds on the Cowers norms 
J7*+^, and thence to all multilinear averages of complexity at most s. 

We prove Proposition 110.21 in later sections. For the remainder of this section we derive 
Proposition 110. ll from the inverse Gowers-norm conjecture. 

A KOOPMAN-VON Neumann theorem H The primary tool in deducing Proposition 
llO.ll from the Gowers Inverse conjecture is the following structure theorem, which allows 
us to decompose an arbitrary function / which is bounded pointwise by z/ into a bounded 
function and a Gowers- uniform function. 

Proposition 10.3 (Koopman - von Neumann theorem). Let s ^ 1 and let N' ^ N ^ 1 

be an integer. Suppose that v is an {s + 2)2'^'^^ -pseudorandom measure on I^n' , (^nd that 
f : Ztv' ^ is a function such that |/(n)| ^ z/(n) pointwise. Then we may decompose 
f = fi + f2, where 

sup |/i(n)| ^1 (10.1) 

^This term has something in common with the term "generahsed von Neumann theorem" in that it 
originally came from analogies with ergodic theory. We now use it in our work to describe a range of 
theorems whose general aim is to decompose a given frmction / into the sum of a function fi which is 
somehow less complicated than /, together with an error /2 which is small in some Gowers norm. 
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and 

ll/2||f/«+i(z,,) = «(l)- (10.2) 
If furthermore f is supported in {—N, . . . ,N} for some N < N'/IO, then we may 
arrange matters so that fi and f2 are both supported on {—2N, . . . , 2N}. 

Remark. Informally, this theorem asserts that in the U^^^ topology, bounded functions 
are dense in the class of functions bounded by u. This fact (and refinements thereof), in 
conjunction with generalised von Neumann theorems such as Proposition 17.11 underlie 
the "transference principle" from [23] which allow one to convert results for multilinear 
averages of 1-bounded functions to results for multilinear averages of functions bounded 
by a pseudorandom measure. This principle is essential for our arguments here, as it 
allows us in many cases to manipulate functions such as as if they were uniformly 
bounded. 

Proof. Let us first make the observation that we can weaken (110. ip to 

sup |/i(n)| ^1 + 0(1) (10.3) 

since one could simply transfer the o(l) error in (110. 3p to the /2 component afterwards, 
using the triangle inequality on (110. 2p . 

We shall rely heavily on a similar result from pU Proposition 8.1]. Before we give this 
result we need some notation. 

Definition 10.4 (Conditional expectation). If / : Z^v' ^ M is a function and 1 ^ 

p ^ oo, we denote ||/||lp{Zjv/) •= (^neZf^ilfin)]'^)^/^, with the usual convention that 
||/IU°°(Zjv') sup„g2^, 1/(^)1- If is a cr-algebra on Z^v', that is to say the Boolean 
algebra generated by the atoms of a partition of Zat/, we define the conditional expecta- 
tion E(/|i3) of / relative to B to be the orthogonal projection in LF'ifLNi) from / to the 
i3-measurable functions. 

In our current notation. Proposition 8.1 from [2l] assert £^ the following. 

Proposition 8.1 of [21]. Suppose that N' ^ N and that u : Zn> ^ R^q is an (s+2)2^+^- 
pseudorandom measure. Let f : Z^r/ ^ M. be such that |/(^)| ^ uln) for all n G Z^v'. 
Let e G (0, 1) be a small parameter, and assume N' is sufficiently large depending on e. 
Then there exists a a-algebra B and an exceptional set Q & B such that 

• (smallness condition) 

Ez^,(^^ln)=o,(l); (10.4) 

• (z/ is uniformly distributed outside of Q) 

11(1 -lf,)E(z/-l|i3)|U^(z,,) = 0,(1) (10.5) 

and 

^^In 24J the result is only stated when ^ f{n) ^ vin), but exactly the same proof applies under 
the more general assumption that |/(?^)| ^ ^{n). In any case, in order to prove Proposition 110.31 one 
could always decompose / into non- negative and negative parts + /~ and follow the proof for each 
part separately. The key point to note is that the function is non-negative, whilst fi ^ 0. Thus 
/i = + satisfies the requisite L°° bound (110. 3p . 
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• (Gowers uniformity estimate) 

11(1 - ln){f - mmWu^^H^^,) ^ e'l^'" = ^s{e). (10.6) 

Let e be chosen later (it will eventually be a slowly decaying function of A^). If is 
sufficiently large depending on e, we can invoke the above theorem. Write 

/ = /l + /2 = /l + /i'^+/f , 



/i := (1 - inmm), 

:=(l-ln)(/-E(/|i3)) 

Wfih^iz,,) ^1 + 0,(1). (10.7) 

\\f^'^\\u^^Hz^,) = f^s{e). (10.8) 



where 
and 

Then by (110.51) we have 
Also, by (110.61) we have 
Next, we claim that 

\\f?^\\us^Hz^,) = Oeil). (10.9) 
To see this, first note that from (110.41) we have 

ll/f IUmz,0 = «^(1)- (10-10) 
Secondly, we prove that for functions g for which \g\ is bounded pointwise by a pseudo- 
random measure u, the L^(Zjv') norm controls the t/*"*"^(Zjv')-iiorm. Indeed for such a 
function we have 

^ Enez^, l^(^) I sup ^h&=+^ n J^in + uj-h) 
= \\T>u\\L^^z^,-)\\g\\Li(^Zj^,), 

where 

Pz/(n) := Y\ + u ■ h) 

t<;e{0,l}=+l 

is the dual function associated to u. However a simple application of the linear forms 
condition, given in detail in Lemma 6.1], confirms that 

\\1^i^\\l<^(Zj^>) < 1 + o{l). 

This concludes the proof of (110.91) . From this, (110. 8p . and the triangle inequality for the 

[/''+^(ZAr/) norm we conclude that 

||/2||c/=+i(Z^,) ^ 0^(1) + K(e). 

Choosing £ to be a sufficiently slowly decaying function of A^ we obtain the first part of 
Proposition 110.31 
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It remains to deal with the situation where / is supportecjEI in {— A^, . . . , N}. We can 
write f{n) = f{n)ip{n), where ip : Z^v' — *■ [0,1] equals 1 on {—N,...,N}, vanishes 
outside of {— 2A^, . . . , 2N} and interpolates smoothly in the range N ^ \n\ ^ 2N. One 
could, for example, take to be a de la Vallee Poussin kernel. If / = /i + /2 is the 
previous decomposition, then upon multiplying by "we obtain / = /i + /2, where 
fi '■= flip and /2 := /2'?/'. The function /i continues to enjoy the bound (110. 3p but now 
also has the desired support property. To confirm that /2 enjoys the bound (110.21) . simply 
use Fourier series to break ■?/' up as a rapidly convergent linear combination of linear 
phases e{nC,/N), and use the triangle inequality combined with the phase invariance 
( IB. Ill) of the t/'*"'"^ norm. This concludes the proof of Proposition 110.31 □ 

Progf GF Prgposition 110. 1L Suppose that A^' G [CN, 2CN] is prime, that u : 
Zat' — s> M is an (s + 2)2'*+^-pseudorandom measure, that / : [A^] ^ M is a function with 
|/(n)| ^ i/(n) for all n G [A^] and that ||/||[/s+i[7v] ^ Applying Proposition 110.31 we 
may decompose 

f = fl + /2, 

where ||/i||l°°(Zjv/) ^ 1 ||/2||c/=+i{Zjv/) — ^(1)- Since C > 10, we may further assume 
that both /i and /2 are supported in {— 2A^, . . . , 2A^}. By Lemma [B. 5 1 the assumption 
that ||/||(7^+i[Ar] ^ S implies that \\f\\u'>+HZf,,) >c,s and hence that ||/i||c/-+i(Zjv/) ^c,s 
6. Applying Lemma [B. 51 once more, we conclude that ||/i||(7=+i({-2Af,...,2Ar}) ^c,s ^• 

We now apply the inverse Gowers-norm conjecture GI(s), translating {— 2A^, . . . , 2A^} 
to the interval [4A^ + 1]? to conclude that there exists an s-step nilmanfold G/T = 
{G /V .dc/v) from a fixed finite collection G/V G J^s,5,c^ together with a bounded s- 
step nilsequence {F{g^x))nm generated by this nilmanifold and with Lipschitz constant 
Os,5,c{X)^ such that 

\&.2N<^n<i2N h{n)F{g'^x)\ >,,5,c 1- 

On the other hand, from (110. 2p and the contrapositive of Proposition 18.21 we have 

\^~2N^n<;:2Nf2{n)F{g'^x) \ = OG/r,s,5,c(l)- 
If A^ ^ No{s, 6, C) is large depending on s, 6 and C, we conclude that 

\^-2N<^n<i2Nf{n)F{g''x)\ >,,5,c 1 

and the claim follows (since / is supported on [A^]). 

If by contrast A^ = Os,s,c{^) then the claim is trivial, since all norms on [A^] are then 
equivalent up to factors of OAr(l) = Os,s,c{^), and all functions on [A^] can be expressed 
as nilsequences (say on the torus M/Z) with Lipschitz constant O^i^) = Os^s,c{^)- □ 



An alternate way to proceed at this point is to modify the proof of [241 Proposition 8.1], where the 
cr-algebra B is initialised not at the trivial factor, but rather at the factor generated by {— iV, . . . , N}. 
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11. Averaging the nilsequence 

To summarise so far, we have reduced the task of showing that the GI(s) conjecture 
imphes the Main Theorem to the much easier task of estabhshing Proposition 110.21 
This, recall, is an estimate on the correlation between the number-theoretic function 
h'^^^r{n) — 1 and the nilsequence F{g'^x). 

The purpose of this section is to perform a rather technical modification to the nilse- 
quence F{g^x), which is necessary for the following reason. At a later stage in the proof 
we would like to discard certain "small" components of the function A'^^^r{n) — 1 from 
this correlation. Some of these components will be easy to discard; for instance, any 
error which is small in norm will be easily removed since the nilsequence is bounded. 
However, there will be one component of A'^^^r{n) — 1 that we shall encounter (namely, 
the term arising from the "smooth" component A" of the von Mangoldt function) which 
will not be small in L^, but is instead small in the Gowers norm U''^^^[N]. In principle. 
Proposition 18.21 or Corollary 111.61 would allow us to safely drop such terms. Unfortu- 
nately, a problem arises because the component of AJ,,y(n) — 1 that we are trying to 
discard is not bounded, and we have also not been able to dominate this component by 
a pseudorandom measure or even to establish a bound for it in L^. To get around this 
problem, we need to improve the "regularity" of the nilsequence F{g^x). In particular 
we must convert it to an object which we can bound in the dual norm U^^^[N]* , defined 
as usual by the formula 

\\F\\us+^[NY ■= SUp{|E„g[Ar]/(?2)F(?2)| : \\f\\u-+i[N] ^ 1}. 

This dual norm also appeared in [21j, and plays a similar role there as it does here. 

It would be very pleasant if every s-step nilsequence was automatically bounded in the 
U'^'^^[N]* norm. Unfortunately, this statement is false even in the s = 1 case, as in 
that case it amounts to a certain /^/^ summability estimate on the Fourier coefficients 
of Lipschitz functions on a compact abelian group. There is no such estimate if the 
group is of sufficiently high dimension. Of course one can rectify this by replacing the 
Lipschitz functions with smooth functions. It seems likely that a similar claim is true 
for higher s, but it also seems likely that a proof would involve a finer analysis of the 
structure of nilmanifolds than we need for the rest of our argument. 

Fortunately, however, we can achieve an adequate substitute result by replacing the 
concept of a nilsequence by its convex hull. Definition 111.11 provides a precise definition. 

Definition 11.1 (Averaged nilsequences) . Let G/T = {G/T, dc/r) be an s-step nilman- 
ifold, and let M > 0. An s-step averaged nilsequence on G/T with Lipschitz constant 
at most M is a function F{n) having the form 

Fin)=E,^jF.,ig^x,), 

where / is some finite index set, and for each i, Fi{g^ bounded s-step nilsequence 

on G/r with Lipschitz constant at most M. 

Remark. An averaged nilsequence of the type just described is a genuine nilsequence 
on the nilmanifold (G/F)^. However the averaging set / will, in applications, have size 
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comparable to and so in our finitary world these averaged nilsequences should be 
thought of as a strict generalisation of the notion of a nilsequence. Were it not for the 
desire to avoid issues of measurablility, we might even have replaced the finite averaging 
operator Ejg/ by an integration over a suitable probability space. 

We now state the crucial technical lemma we need, which allows us to replace a nilse- 
quence by an averaged nilsequence with a good W^^^lN]* bound. 

Proposition 11.2 (Decomposition of nilsequences). Let G/T = {G/T^dG/v) be an 
s-step nilmanifold, and let M > 0. Suppose that {F{g^x))n£N is a bounded s-step 
nilsequence on G/T with Lipschitz constant at most M. Let e G (0, 1) and suppose that 
N ^1. Then we may effect the decomposition 

F{g''x)=F^{n) + F2{n), (11.1) 

where Fi : N ^ [—1, 1] is an averaged nilsequence on (G/r)^°^^~-^ with Lipschitz constant 
OM,£,G/r(l) and obeying the dual norm bound 

\\Fl\\u''+'^[N]* ^M,e,G/r 1, (11-2) 

while F2 : N — i> M obeys the uniform bound 

||i^2||oc = 0(£). (11.3) 

Remark. At present, our decomposition f lll.ip depends on the parameter A^. It is possi- 
ble to modify the argument below in such a way that the decomposition is independent 
of A^, but this requires generalising the notion of an averaged nilsequence by replacing 
the averaging over a finite set / with an integral over a continuous probability measure. 
As this introduces some minor technical issues such as measurability, we shall settle for 
the slightly weaker formulation of Proposition 111.21 given above, as it still suffices for 
our application. 

We shall prove Proposition 111.21 shortly. Assuming it for the moment, we may make 
yet another reduction of the Main Theorem. This we do by reducing Proposition I1U.2I 
(which, as we have already shown, implies the Main Theorem) to the following result. 

Proposition 11.3 (W-tricked A orthogonal to averaged nilsequences). Let s ^ 1, and 

assume the MN(s) conjecture. Let G/T = [G/T^dG/v) be an s-step nilmanifold with 
smooth metric dc/v, o-nd let Fi{n) be an averaged s-step nilsequence with Lipschitz 
constant M . Let b G [W] be coprime to W . Suppose we also have the dual norm bound 

^ M'. (11.4) 

Then we have the bound 

Indeed, to deduce Proposition 110.21 from Proposition 111.31 let e G (0, 1) be arbitrary 
and apply Proposition 111.21 The contribution of F2 will be bounded by 0{e) + 0^(1) 
thanks to (11. 5p and (lll.3p . The contribution of Fi can be controlled using Proposition 
lll.3[ Putting these estimates together leads to the bound 

Er,e[N]{K,win) - l)F(^7"a:) = OM,G/r,s,e{^) + o.(l) + 0{e). 
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Letting e go to zero sufficiently slowly, we obtain the claim. 

In later sections we shall prove Proposition [TT31 For now we turn to the task of proving 
Proposition 111.21 

Proof of Proposition Fix G/T, s, M. Observe that if we have proven the proposi- 
tion for a single Lipschitz function F, then if we perturb F in the L°° norm by e then the 
statement is still true for the perturbed function (with slightly worse implied constants 
in the 0{) notation). On the other hand, since G/T is a compact metric space, we 
know from the Arzela-Ascoli theorem that the space of Lipschitz functions F on G/T 
with Lipschitz constant at most M is equicontinuous and hence compact in the uniform 
topology. In particular, it can be covered by finitely many balls in the uniform metric 
of radius e/2, say. In view of this compactnes^, we see that it will suffice to establish 
the qualitative version of the Proposition, namely given any continuous function F (not 
necessarily Lipschitz) and any e > 0, we have a decomposition (111.11) for all ^ 1, 
g & G and x G G/T, where Fi is an averaged nilsequence on G/T with Lipschitz con- 
stant uniform in g, x, N, and with dual norm ||-Fi||c/s+i[7v]* bounded uniformly in A^, g, x, 
and F2 obeys the bound (Ill.Sp . 

Fix F and e. To proceed further we need to detect some "constraints" on the orbit 
n ^ g^x in G/T. The most convenient framework for giving such constraints will be 
the (s + l)-dimensional parallelepipeds in G/F, as studied in [32j . 

Definition 11.4 (Parallelepipeds in nilmanifolds) . Let (G/F)'"^"'^^"^^ denote the space 
of all 2*+^-tuples (a;^)aje{o,i}''+^- ("^ + dimensional parallelepiped is any element of 
(G/F)^0'i>'+' j^g^^i^g ^j^g 

[9 a;j^g{o,i}^+i 

for some g & G, x & G/T, n G Z, and h G Z''^^. Here, and for the remainder of 
the paper, we write u ■ h := ujihi + ■ ■ ■ + ciJs+i^s+i where u = (tUi, . . . ,uJs+i) and 
h = {hi, . . . ,hs+i). 

A fundamental property of s-step nilmanifolds is that the value of any one vertex of a 
parallelepiped (say, the zero vertex Xq^+i, where 0'^+^ := (0, . . . , 0)) is determined "con- 
tinuously" by all the other vertices. In the following proposition, and for the remainder 
of the paper, write {0, 1}^+^ := {0, 1}^+^ \ {0^+^}. 

Proposition 11.5 (Parallelepiped constraint). There exists a compact set 

S C (G/F){°'i}*'-' 

and a continuous function P : S ^ G/T such that, for any {s + 1)- dimensional paral- 
lelepiped {xuj)uje{o,i}^+^ , we have (a^t<;)(^g{o,i}^+i ^ ^ (^''^d the constraint 

= P((x<^)<^e{o,iK+0- 

""^^One could also use the compactness of G/T to remove the requirement that all bounds be uniform 
in X. However the parameter g ranges over the non-compact group G and cannot be eliminated so 
easily; the range of the parameter n is similarly non-compact. Thus we will be forced to look for 
constraints in the orbit g^x which are independent of g and n. This helps motivate our introduction 
of cubes below. 
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This proposition is a topological and algebraic statement about the structure of nilman- 
ifolds, and it was essentially proved in [32j. We supply a complete and self-contained 
proof in Appendix [Ej taking the opportunity to introduce the Host-Kra cube groups. A 
closely related statement regarding arithmetic progressions in nilmanifolds appeared in 
[26t Lemma 12.7]; results of this latter type seem to have been around in the ergodic 
theory community for some time and feature, for instance, in the papers of Furstenberg 
[131 E]. 

For now, we shall simply illustrate this proposition with two model examples before 
continuing with the proof of Proposition lll.2[ 

Example 14 (Abelian shift). Take s = 1, let G be an abelian Lie group, and let F 
be a cocompact lattice in G. Thus G/T is a compact abelian Lie group, and any 
action of (7 G G on G/T has the form of a shift x ^ x + g. Of course, G/T is a 
1-step nilmanifold. A 2-dimensional parallelepiped in this nilmanifold takes the form 
(x + ng, X + {n + hi)g, x + {n + h2)g, x + {n + hi + h2)g). The first vertex is a function 
of the other three. In the notation of Proposition 111.51 we can take S := (M/Z)^ and 
P : S — > G/T be the map P(?/io, yoi, := Uoi + Uu) — Vii and we easily verify that 
yoo = P{yio,yoi,yii) whenever (?/oo, 2/io, Z/oi, is a 2-dimensional parallelepiped. 

Example 15 (Skew shift). For the sake of illustration, we consider a quotient G/T where 
G is 2-step nilpotent but not connected. The way we have set things up in this paper, 
then, G/T does not qualify as a nilmanifold; however one can modify this example so 
that it genuinely takes place in a nilmanifold (cf. the proof of Proposition 18. 4p . 

Set G := (^01^^ and F := Qii^ Then G is 2-step nilpotent, and G/T may be 
identified with the torus (M/Z)^ via the map 

ix,y)^[lll)T. 

Taking g := Q 1 a it is easy to check the action of g on G/T is given by {x,y) ^ 
{x + a,y + x). The 3-dimensional parallelepipeds of this nilflow take the form 

(x + (n + c<j ■ h)a, y + ^{n + u ■ h){n + u ■ h + l)a + {n + u ■ h)x)^^^Q ^p. 

The key point to note here is that the first coordinate is at most linear in n, h, while 
the second coordinate is at most quadratic. Take the set S to be the set of all 7-tuples 
{{^ui^yoj))uie{o,i}l with the linear constraints 

2^000 + ^011 = ^010 + 3^001 

a^ooo + 3^101 = 2:001 + 2^100 
a^ooo + Xuo = Xioo + Xoio- 

The map P : S ^ (M/Z)^ is given by the alternating sum 

This is ultimately a reflection of the fact that linear and quadratic functions have 
vanishing third derivative. Note, in contrast to the previous example, that for the skew 
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shift a vertex of a 2- dimensional parallelepiped is not determined continuously by the 
other three vertices. 

Now we return to the task of proving Proposition lll.2[ Let P and S be as in Proposition 
lll.5[ The function x ^ F{P[x)) is continuous on the compact metric space E. By the 
Stone- Weierstrass theorem, we may approximate this function to uniform accuracy 0{e) 
by a finite linear combination of tensor products of bounded Lipschitz functions on G/T, 
obtaining the uniform approximation 

F{P{x)) = Y, n H^A^^) + 0{e) 

for some finite index set A and some 1-bounded Lipschitz functions H^^a '■ G/T ^ 
[—1, 1]. In particular, since (5'"'^'^''*)^g{o ^^^^ ^ image of this point under 

P is g^-x, we have 

Fig-x) = J2 n H^Aa'^^^"^) + 0(6) 
for all g E G, X E G/T, n G Z, and h E Z. 

Now we introduce the parameter N ^ 1 and averag the h parameter over the box 
[jV]«+i. In fact it is necessary to perform this averaging somewhat smoothly, to which 
end we take a smooth function cutoff a : M — [0, 1] which is supported on [—1, 2] and 
equals 1 on [0, 1], and then set 

F{g^x) = F,{n) + F^in) 

where 

F,{n) := Y,^he[N]'+Mhi/N) . . . a{h,+,/N) J] H^^^{g^+--^x) 

c^e{o,i}^+^ 

and F2{n) = 0{e). In particular, we have ||-Fi||oo ^ 1 + 0{6) since F is bounded by L 
By shrinking the Lipschitz functions H^ ^ by a multiplicative factor of 1 — 0(e), and 
transferring the error over to F2, we may in fact ensure that ||-Fi||oo ^ 1- 

Now observe that for each fixed a, uj and h the function given hy n ^ Huj,a{g"'~^'^'^x) = 
Hu},a{g'^{g^^x)) is a Lipschitz nilsequence on the s-step nilmanifold G/F, with Lipschitz 
constant independent of A^, g and x. We remarked, in ^ that the Lipschitz nilsequences 
form an algebra in a certain sense. From this remark we conclude that Fi is an averaged 
Lipschitz s-step nilsequence on the product space (G/F)''^'^^* , again with Lipschitz 
constant independent of A^, g and x. To conclude the proof it suffices to show that Fi is 
also bounded in U'^^'^[N]* uniformly in N,g and x. By the triangle inequality and the 
definition of the U'^^^[N]* norm, it thus suffices to show that the absolute value of 

E„e[jv];/.e[^]=+i/(n)a(/ii/iV) . . . a{hs+i/N) J] H^Aa^^^^'''^) (H-S) 

"'^"'One could take a limit here as ^ cxd, using an ergodic theorem to ensure suitable convergence; 
this would make the decomposition F = Fi + F2 independent of N , but at the cost of replacing the 
finite averaging in the definition of an averaged nilsequence with an infinite one. We omit the details. 
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is uniformly bounded in N,g,x whenever / : [N] — M satisfies ||/||{/s+i[Ar] ^ 1. From 
this point onwards we do not care what the functions n i— »• Hi^ a{g"'x) actually are: it is 
merely important that they are 1-bounded. For that reason we write hi^{n) = H^^^aig^x), 
whereupon the quantity (111.51) that we are to show is uniformly bounded becomes 

f{n)a{h/N)...a{hs+i/N) J] Kin + u-h). (11.6) 

aje{o,iK+' 

At this point we transfer to a group Z^f where A^' = lOsA^ (say). Slightly abusing 
notation, the expression (I11.6P is, up to factors of 0^(1), equal to 

Enez,,;/.ez^r /(^)^(^i/^) • • • ^(hs+i/N) J] ^.(n + uj-h). (11.7) 

Here we have extended / from [N] to all of Z^v' by defining it to be zero outside of [A^]. 
Now by taking a Fourier expansion on we may write 

a{hi/N) ...a{hs+i/N) = ^ Cr„...,r,_,,e{{rih + ■ ■ ■ + rs+ihs+i)/N). 

ri,...,rs+i 

By choosing the cutoff a to be sufficiently smooth, we may ensure that 

|Cr-i,...,r,+J = 0,(1). 

ri,...,rs+i 

Thus to show that (111.71) is uniformly bounded it suffices to show the same for 

E„gz^,;/.gz^V/(^)e((ri/ii + ■ ■ ■ + rs+iK+i)/N) J] K{n + u ■ h) (11.8) 

for all ri, . . . ,rs+i G Zjyi. It is easy to see that the exponential may be split up and 
incorporated into the b^( ) terms, and therefore we have reduced the matter to placing 
a bound on 

E„ez,,^,,^H^i/(n) n hUn + u-h). (11.9) 

a;G{0,lK+^ 

Now we are assuming that H/Ht/s+HAr] ^ 1- By Lemma [RSl this implies that 
= 0,(1). The boundedness now follows from the Gowers-Cauchy-Schwarz inequality 
(lB.12p . Tracing backwards, we see in turn that (lll.9p . flll.7p . (111.61) and (111.50 are all 
0,(1), thereby concluding the proof. □ 

Although we will not need this fact here, it is interesting to note that Proposition 111.21 
allows one to extend Proposition 18.21 from bounded / to integrable /: 

Corollary 11.6 (Nilsequences obstruct uniformity, II). Let s ^ and 6 G (0,1). 

Let G/T = {G/r,dG/r) be a nilmanifold with some fixed smooth metric dc/v, Ojnd let 
{F{g'^x))nm be a bounded s-step nilsequence with Lipschitz constant at most M. Let 
f : [A^] ^ M 6e a function for which 

Ene[iv]|/H| ^1 

and 

\Er,e[N]f{n)F{g^x)\^5. 



Then we have 



LINEAR EQUATIONS IN PRIMES 
||/||c/»+i[Af] ^s,5,M,G/r 1- 



45 



Proof. We apply Proposition 111.21 with e equal to a small multiple of 6, and conclude 
from the triangle inequality that 

|E„g[^]/(n)Fi(n)| ^6/2. 

Since Fi has a U'^'^^[N]* norm of Os^s,M,G/r{^), the claim follows. □ 



12. A SPLITTING OF THE VON MANGOLDT FUNCTION 



To summarise so far, we have reduced the task of proving that the GI(s) and MN(s) 
conjectures imply the Main Theorem to the much easier task of establishing Proposition 
lll.3[ This is a correlation estimate involving ^. It is convenient to return at this 
point to the original von Mangoldt function A. The contribution from the prime powers 
which are introduced when A[ ^j- is replaced by A;, vf is easily seen to be negligible, and 
so it suffices to establish the estimate 

E„,e[Af](A6,ty(n) - \)Fx{n) = OM,M',G/r,s{^)- 

Recalling the definition flS.ip of Ab^]y(^n), we are thus trying to establish the bound 

(J)(W) 

Ene[N]i^^AiWn + 6) - l)Fi(n) = OM,M',G/r,s(l). (12.1) 



At this point we perform a standard decomposition of A into a "smooth" piece A« 
corresponding to small divisors and a "rough" piece A** corresponding to large divisors. 
We take a small exponent 7 = 7^ > 0, whose exact value will be specified later, and set 
R:= N^. Observe from ([HII]) that 

A{n) = -\ogRj2Mx{^^) 

d\n 

where x • ~^ is the identity function x{^) •= ^- We now perform a smooth 
splitting X = + X^ where xK^) vanishes for \x\ ^ 1 and x^{^) vanishes for |a;| ^ 1/2, 
the precise form of this splitting being unimportant. This induces a splitting A = A^+A^, 
where 

A\n) := -logRj2Kd)x\^^) and A\n) := -log Mx\^^). (12.2) 

log it log it 

d\n d\n 

Thus to prove (112. ip it will suffice to show the estimates 

E„e[7V](^^A»(W^n + b) - l)F,{n) = o,,m'(1) (12.3) 

and 

E„g[^]^^A^(H^n + b)Fi{n) = OM,G/r,s(l). (12.4) 



'||^A.(^r„ + .)-l||„,,.,„,, 
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We begin by establishing the bound fll2.3l) . It is here that we need the dual norm bound 
(111.41) . Indeed, from that bound we have 

\E^emi^AKWn + b) - l)Fi(n)| ^ \\^Ai(Wn + b) - l||^,^,j^,l|Fil|^.+.[;v]* 

^ M' 

It suffices, then to show that 

W^A^iWn + b)- l||^,,,j^j = 0.(1). (12.5) 

This is a multilinear correlation estimate for a truncated divisor sum, and can be treated 
by standard sieve theory methods related to the correlation estimates of Goldston and 
Yildirim [T71 [TH] provided that the exponent 7 is sufficiently small (an appropriate 
choice would be, for example, 7. := ^2~*). We provide the details of this computation 
in Appendix [Dl This establishes fll2.3p . 

It remains to establish the bound fll2.4p . Recall that Fi is an averaged nilsequence. 
From the triangle inequality, it will thus suffice to prove the bound 

E„g[^]^^A^(iyn + b)F{g^x) = OM,G/r,.(l) (12.6) 

for all 1-bounded s-step nilsequences F{g^x) of Lipschitz constant M. We emphasise 
that the o-term is required to depend only on M, G/F and s, and should be otherwise 
be independent of F, g and x. 

We will eventually apply the MN(s) conjecture, which comes with the safety net of 
an error term which decays like log""^ for any A. With this in mind, we begin by 
removing the W^-dependence in fll2.6l) in a rather crude fashion. Since (f){W)/W ^ 1, 
we ignore this factor completely. 

Now by a simple substitution we have 

Ene[N]A\Wn + b)F,{g''x) = WEt^^^wN+bln^b(mod w)A\n)Fi{g^^-'^/'^ x). (12.7) 

Now any Lie group G over M for which the exponential map exp : q ^ G from the 
associated Lie algebra is surjective is divisible, meaning that given any g E G and any 
positive integer m there is an element g^^"^ E G with (^g^/'^y'^ = g. When G is simply- 
connected and nilpotent, exp is a homeomorphism (see [7] for details). In our setting, 
write g' := g^^^ and x' := g~^^^x. Then for all n = 6(mod W) we have 

Fi(^?'V) = Fi((7("-'')/^x). (12.8) 

Note that the left-hand side here makes perfect sense for any n, not just for n such that 
n = 6(mod W). 

The constraint l„=;,(mod w) niay be expanded as a Fourier series 



ln=6(mod W) 



J2 e{-rb/W)e{rn/W) 
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on Ziy. We substitute this and fll2.8p into fll2.7l) . noting that each function n h- 
e{rn/W) may be reahsed as a 1-bounded, 0(1)-Lipschitz nilsequence on the 1-step 
nilmanifold R/Z. Replacing G/T with G/T x M/Z, we see that in order to prove (112.61) 
it suffices to show that 

W¥.^^n4WN+h^\n)F{g^x) = om,g/t,s{X) (12-9) 
for all M-Lipschitz 1-bounded nilsequences {F{g"'x))n£N on an s-step nilmanifold G/T. 

In fact we will establish the stronger estimate 

I A\n)F{g''x)\ «M,G/r,s,A iVlog-^iV (12.10) 

ne[N] 

for any A > 0. Note that w was chosen to be so slowly growing that W = 0{logN), 
so this estimate really is stronger than (112.91) . We expand the left-hand side of (112.101) 
using (112. 2p and reduce to showing that 

I E «M.G/r,s,A iVlog-^AT. (12.11) 

n€[N] d\n 

The left-hand side may be rearranged as 

logi? 

me[N] de[N/rri] ^ 

Observe that is supported on |a;| ^ 1/2, and so the summand vanishes unless d ^ 
i?^/^, in which case m ^ N/R^^'^. We now apply the Mobius and nilsequences conjecture 
MN(s). Together with a straightforward summation by parts to remove the smooth 
cutoff this shows that 

I E Mx\^^)F{ig"^rx)\<^M,G/r,s,A-^og-^-- 

d£[N/rn] ^ 

Note that we are making critical use here of the fact that the bounds in the MN(s) 
conjecture are uniform in the g parameter in order to deal with the fact that we have 
dilated g to (7™. Since m ^ N/R^^"^, we see that log~'^(A^/m) log""^ A^. Summing in 
m and absorbing the logarithmically divergent sum X]me[Af] m ^'^^^ log~^A^ factor 
we obtain (112. lip as desired. This in turn implies (I12.10p and hence, by our earlier 
series of reductions, (112. 4p . Together with (112. 3p . which we have already established, 
this concludes the proof of Proposition 111.31 By our long series of earlier reductions, 
this (finally!) completes the proof of the Main Theorem. □ 



E E Mx\^^)F{{g-m 



13. Variations on the main argument and other remarks 



It is conceivable that our methods here extend to certain "finite complexity" multi- 
linear averages involving systems of polynomials ipj{n) rather than affine-linear forms. 
Indeed, the machinery of "PET induction" (see e.g. [3]) allows us in principle to use 
repeated applications of Cauchy-Schwarz to control certain of these averages by Cowers 
uniformity norms. A model problem would be to count the number of p, n for which the 
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numbers p,p + n,p + ?7.^, . . . ,p + n^ are all prime. A naive attempt to do this meets with 
what seems to be an insurmountable obstacle. Namely, in order to restrict the range of 
the primes concerned to an interval such as [A^], certain other parameters (for example 
the "shifts" h in the definition of the Gowers norms) have to be restricted to a much 
smaller range, say of size 0{N^^^'^^). This makes it impossible to pass back and forth 
between [A^] and Zjv' as we have done above, and the evaluation of exponential sums 
with /i or A on such a range seems to be beyond hope, even assuming the GRH. It may 
be that the PET induction scheme can be "globalised" to avoid these issues, but we do 
not know how to address this at present. 

For the benefit of readers who are only interested in the unconditional "quadratic" [s = 
2) applications of this paper such as Corollary 11.71 or Examples IMZl we outline a shorter 
path to the Main Theorem in that case. This approach avoids Lie theory completely, 
and probably represents the best approach to obtaining bounds for error terms. Note, 
however, that with either approach our error terms are completely ineffective unless the 
GRH is assumed. The introduction of Lie theory, though strictly speaking unnecessary, 
seems to make our work easier to understand from the conceptual point of view. This is 
especially the case when s ^ 3, where it is not even clear how Lie theory-free analogues 
of the GI(s) and MN(s) conjecture might be formulated. 

In the quadratic case it is possible to replace the concept of a 2-step nilsequence by 
more concrete objects. In a sense these are more basic than 2-step nilsequences, if only 
because in [26j we introduce these objects first and then build nilsequences from them. 
Note, however, that this may be an artifact of our approach. 

These more basic objects can then be manipulated by hand without resorting to machin- 
ery such as the Host-Kra theory in Appendix [El Let us consider, by way of illustration, 
the following more concrete version of the inverse Gowers-norm conjecture GI(2) which 
was proven in [26] . 

Theorem 13.1 {U^ inverse theorem with bracket polynomials). Let f : [N] [—1, 1] 
be such that \\f\\u'-^[N] > ^ for some < S ^ 1 and N ^ 1. Then there exists a positive 
integer J = Os{l) and real numbers (ij,bj,C,j^i,C,j^2,^j,3 for j G [J] such that 

|E„e[^]/(n)e(0(r2))| >5 1 (13.1) 

where (p is the function 

ie[J] 

Remark. As before, {x} denotes the fractional part of x, which we take to lie in (— |, |]. 

This result follows quickly from [261 Theorem 10.9] using Lemma [B.SI to work in a cyclic 
group of prime order. We refer to the phase (f){n) (113.11) as a "bracket polynomial" . By 
modifying the arguments in §101 one can transfer this theorem to the case when / is 
bounded by a pseudorandom measure u rather than by 1, thereby reducing Theorem 
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17.21 to the establishment of the exponential sum estimate 

EneiN]{Ab,w{n) - l)e( - ^ {aj{^j,in}{^j,2n} + &i{0,3^})) = oj(l) 

i6[J] 

uniformly over all b G [W] with gcd(6, W) = 1. This could in principle^ be established 
directly by Vinogradov's method, following the machinery in [27j, though the argument 
would be rather lengthy. Alternatively one can deduce this result from the corresponding 
results for the Mobius function established in [27] using a variant of the arguments in 
this paper. 

A key difference is that the Host-Kra machinery and the machinery of averaged nilse- 
quences are no longer required. Instead, the above function e(0(n)) can be replaced 
by a smoother variant, constructed for instance using a variant of the dual function 
machinery in [21], in order to obtain a function which is bounded in (U^)*. This pro- 
vides an analogue of Proposition 111.31 and from that point onwards one may proceed 
similarly. 

One could also use a still more "basic" type of obstruction for the t/^-norm, namely 
phases which are locally quadratic on Bohr sets (cf. [26t §2]). These require even less 
unpacking than the bracket quadratics above, and indeed it was found to be rather 
convenient to work with these functions in [27]. It takes a while to even define these 
functions properly, however, and they suffer from a few technical deficiencies which 
affect various other steps of the argument. Perhaps the most serious is that if n i— >■ f{n) 
is such a function then n \—>- f{dn) need not quite be, a phenomenon which causes 
trouble in gI21 □ 



14. A BRIEF DISCUSSION OF BOUNDS 



We have shied away from giving any explicit bounds on our o(l) error terms. There are 
at least two reasons for this. Firstly, it is notationally easier to avoid doing so. Secondly, 
and much more importantly, unless one assumes the GRH we do not have any explicit 
bounds! 

By way of illustration, let us consider the statement 

Ea;,d<Ar/i(a;)/i(x + d)ji{x + 2d)n{x + 3(i) = o(l), (14.1) 

which follows from the case s = 2 of Proposition 19.11 A discussion of correlations 
involving A would go along similar lines, but there is the distraction of the singular 
product iS^Ylpl^P- 

As we remarked, the error term here is completely ineffective without assuming GRH. 
Indeed to show that the left-hand side in (114.11) is at most 5, we would ultimately (deep 
inside the paper [27]) need estimates for the sum of the Mobius function over arithmetic 

"'^^Indeed, this exponential sum is a more complicated variant of the more traditional exponential 
sum X^nefAf] A("')6(q^^^)i which was considered for instance in [151 133j . 
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progressions with common difference q ~ log ^ ' N . Although such estimates exist, the 
error terms involve an ineffective constant C{A{5)) due to the possible presence of 
Landau-Siegel zeros. 

Assuming the GRH one could prove using our methods that 

\^x,d^Nl^{,x)ii{x + d)fi{x + 2d)fi{x + 3d)\ ^ Clog^'^ 

for some explicit C and some explicit (but small) c > 0. To obtain such a result it would 
be best to avoid the use of Lie theory as outlined in ^ IT3| since the many approximation 
arguments involved in that theory are quite costly from the quantitative point of view. 

Improved results in additive combinatorics (particularly a solution to the so-called Poly- 
nomial Freiman-Ruzsa conjecture, which could be used as an input in could lead 
to a bound of the shape exp(— log'^ A^). However it seems that obtaining a bound A^~'^ 
is very difficult. 

Unconditionally, a bound in (114.11) of the form 0(/(n)) for some explicit function f{n) 
tending to zero as n ^ oo and some ineffective implied constant 0{) would be very 
interesting. 

To set the above discussion in context, we mention the best available results for three- 
term progressions, which follow from estimates for sup^^^/j^\Kn^f^n{n)e{an)\. These 
seem to be as follows. 



Bounds of a similar type could be obtained for any instance of Proposition 19.11 with 



In this appendix we recall some profoundly classical facts concerning convex bodies 
which will allow us to manipulate cutoffs such as Ik readily, beginning with an ancient 
observation of Archimedes. 

Lemma A.l (Archimedes comparison principle). Let Ki C K2 C M'^ be bounded convex 
bodies. Then the surface area of Ki is less than or equal to the surface area of K2. 

Proof. It is easy to see that the intersection of K2 with a half-space has lesser or equal 
surface area than K2. Since Ki can be approximated to arbitrary accuracy by the 
intersection of finitely many half-spaces, the claim follows. □ 




any A> Q Davenport [10] 
on GRH Baker-Harman [2]. 



s = 



1. 
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Corollary A. 2 (Boundary region estimate). Let K C [— A^, A^]*^ be a convex body. If 
6 G (0, 1), then the eN -neighbourhood of the boundary OK has volume Oct{sN'^). 

Proof. Rescale so that = 1. By differentiating in e we see that it suffices to show 
that any convex body in [—2,2]*^ has surface area 0^(1). But this follows from the 
Archimedes comparison principle. One could also derive this fact using the theory of 
mixed volumes; see [38]. □ 



At this point we can now readily prove (11.31) using the Gauss volume-packing argument. 
By intersecting K with the half-spaces {x G M'^ : ipj{x) > 0} it suffices to show that 

\K n Z'^l = vo\d{K) + OdiN'^-^) 

for all convex bodies K C [—N, N]. However, given that \Kr\Z'^\ is equal to the volume 
of the set {KnZ'^) + [-l/2, 1/2]^, which differs from K only on the 0^(1) -neighbourhood 
of dK, the claim then follows from Corollary IA.2I 

Now we give an analytic consequence of Corollary IA.2I 

Corollary A. 3 (Lipschitz approximation of convex indicators). Let K C [-N^Nf be 
a convex body and let e G (0, 1). Then we can write Ik = Fe + 0{Gs), where Ff^^Ge 
are non-negative Lipschitz functions on [— 2A, 2A]'^ with Lipschitz constants 0{j^) and 
bounded in magnitude by 1, and where J^aGsix) dx = OdisN'^). 



Proof. We take 

f.(,):=„,a.(l-!il5^^.0) aud G.W := max(l - o). 
The claim follows easily from Corollary IA.2I □ 



In practice, Corollary IA.3I allows us to replace a rough cutoff such as 1^: with the 
smoother operation of Lipschitz cutoffs. This can then be combined with Fourier anal- 
ysis to replace the Lipschitz cutoffs in turn with modulations by linear phases, which 
turn out to be utterly harmless in our analysis. This might remind readers of the 
Polya- Vinogradov completion-of-sums method, or the Erdos-Turan inequality. 



Appendix B. Cowers norm theory 



In this appendix we develop the general "elementary" theory of Cowers uniformity 
norms, which were introduced in [21] and subsequently, in the rather different context 
of ergodic theory, in [32]. By elementary in this context, we basically mean that we 
only pursue here those results which can be obtained as an easy consequence of the 
Cauchy-Schwarz inequality. This is in contrast to the more advanced inverse theory 
involving nilsequences, Fourier analysis, and suchlike. The theory here is an amalgam 
of parts of m §3], [23], M §5], [2S1 §1], [32], [HI §3], [HI SS] , or [IHl Ch. 11]. 
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It is convenient to work rather abstractly at first, dealing with complex- valued functions 
of many variables. This level of abstraction will be useful for us when we prove the 
generalised von Neumann theorem, Proposition 17.11 in ^JO The argument is essentially 
that of §5], generalised to handle arbitrary systems of linear forms rather than 
merely /c-term APs, but the introduction of extra notation somewhat eases the process 
of actually carrying this out. 

Definition B.l (Gowers box norms). Let {Xa)a£A be a finite non-empty collection 
of finite non-empty sets, and for any BOA write Xb '■= YlaeB-^a for the Cartesian 
product. If / : Xa — >■ C is a complex- valued function, we define the Gowers box norm 
\\f\\a{XA) ^ to be the unique non-negative real number such that 

ll/rD(x,):=E(o),.),., n (B.l) 

where C : z i— > z is complex conjugation, and for any = {xa^)aeA and x^'* = {xa^)a(^A 
in Xa and ua = {uJa)aeA in {0, 1}^, we write x^'' := {x^a°'^)aeA and \uja\ ■= ZloeA^a- 
We adopt the convention that if A is empty (so that / is a constant), then ||/||n(XA) •= /• 



It is not immediately obvious that the right-hand side of (IB.ip is non-negative, or that 
the term "norm" is appropriate. We will establish both of these facts below. 

Examples 2. If A = {1}, then 

ll/lln(x,) = {\ioi4^^exJ(^i^)f(^i^)f' = |E...xJ(xi)| 
while ifA = {1,2}, then \\f\\n{x^,2) = 

1/4 



In general, the 2l'^'th power of the □(X^) norm on /a is a multilinear average of /a 
over I A I -dimensional boxes (hence the name). 



It is easy to verify the recursive relationship 



in(x.) = E4^4^)exJI/(^4°^)/(^4^^)llnS.U) (B.2) 

whenever a & A, which can be used as an alternate definition of the box norms. In 
particular we see that the box norms ||/||n(XA) are non-negative for A non-empty. These 
norms are also conjugation-invariant, homogeneous, and enjoy the positivity property 

||/||n(x.) ^ MhiXA) (B.3) 

whenever / : Xa — > C and u : Xa obey the pointwise bound |/(xa)| ^ i^{xa) for 

all Xa G Xa- 

The box norms are also invariant under a large class of phase modulations. Indeed one 
easily verifies from flB.2p and induction that 



ll/e($^0B)||n(x.) = ll/||n(x.) (B.4) 



BCA 
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where e : M/Z ^ C is the standard character e(x) := e^'^*^ and for each proper subset 
B ^ A, the phase function (pB '■ Xb —>■ M/Z is arbitrary. Thus the □(X^) norm 
is insensitive to "lower order" modulations which involve only a proper subset of the 
variables in Xa- 

A fundamental inequalityPI concerning these norms is 

Lemma B.2 (Gowers-Cauchy-Schwarz inequality). Let {Xa)a(^A be a finite collection 
of finite non-empty sets. For every uja € {0, 1}"^ let /^^ : —^Cbea function. Then 

Kt^^T^^a n Cl--l/.,(a:^))| ^ n Wf^AlhiXA). (B.5) 

Proof. We induct on \A\. When \A\ = the claim trivially holds, and in fact there is 
equality. Now suppose that \A\ ^ 1 and the claim has already been proven for smaller 
sets A. 

Partition A as A' U {a} for some a ^ A. We can rewrite the left-hand side of fIB.Sp as 

i-'c,e{o,i} 

where 

(0) _ TT r-I^A'lf, jrri'^A') 



^ \^ A' > ^ 



By Cauchy-Schwarz it thus suffices to show that 

^A'e{o,iH' 

for each Ua € {0, 1}. We can expand the left-hand side as 

c.^,e{o,i}-4' 

Applying the induction hypothesis, we can bound this by 



and the claim now follows from Holder's inequality and (]B.2p . □ 

From fIB.Sp we easily deduce the Gowers triangle inequality 

\\f + g\\n{XA) ^ ll/lln(x.4) + MhiXA) 

^^In our treatment here, this inequahty plays a more central role than in earlier papers; we are using 
it as a kind of "universal Cauchy-Schwarz inequality" , in the sense that any other inequality that we 
need, which would in earlier papers be proven by multiple applications of the ordinary Cauchy-Schwarz 
inequality, is instead proven here by a single application of the Gowers-Cauchy-Schwarz inequality. This 
seems to fit with the philosophy that the Gowers norms are somehow "universal" or "characteristic" 
for all averages of a certain complexity. 
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as can be seen by raising both sides to the power 21"^!. Let us also observe, setting all 
but one of the functions in (1B.5I1 to be Kronecker delta functions, that if ||/||n(XA) = 
and \A\ 2 then / vanishes identically. Thus we see that the □(XA)-norm is indeed a 
norm for |y4| ^ 2, whilst for = 1 it is merely a semi- norm. 

As a consequence of the Gowers-Cauchy-Schwarz inequality we obtain 

Corollary B.3 (Second Gowers-Cauchy-Schwarz inequality). Let {Xa)a£A be a collec- 
tion of finite non-empty sets. For every B ^ A let J'b '■ C be a function. Then 

2\A\-\B\ l/2l^|-|-B| 

11/ 

BCA BCA 

where Xb G Xb is the restriction of Xa to the indices B, and for any complex number z 
we define := z when n = and z^ := for n > 0. 



Proof. For each a;^! G {0, 1}^ we let fuj^ : Xa C be the function 

where B := {a & A : Ua = 1} ■ Then we can rewrite the above left-hand side as 

a.AG{0,l}-4 

which by the Gowers-Cauchy-Schwarz inequality is bounded by 

n wfi^AihiXA)- 

However, direct calculation (using flB.2p . for instance) shows that 

II f II _ II .21^1-1^1 ||l/2l^l-l^l 

\\JujA\\a{XA) - WJb \\d{Xb) 
where B := {a E A : Ua = 1}, and the claim follows. □ 

As a special case of Corollary IB. 31 (together with ( IB.SP ). we see that 

K^exjAixA) n fBi^B)\ ^ WfAlbiXA) (B.7) 
BCA 

whenever the functions /s are bounded in magnitude by 1 for BCA; compare this 
with ( lB.4p . The inequality ( ]B.7p asserts that the □ norm is stable with respect to lower 



order functions and can be viewed as a type of generalised von Neumann theorem. 

Remark. If is also bounded by 1, then there is a converse to (IB. 71) . namely that there 
exist bounded functions for which 

l^x^eXAfAixA) n fB{xB)\ > WfAfoiXA)- 
BCA 

Indeed this follows easily from raising (IB. II) to the power 2'"^' and using the pigeon- 
hole principle to freeze the x^^^ variables. Thus we see that the lower order functions 
YIbca f^ixB) are "characteristic" for the □(X^) norm: if ||/A||n(XA) is large then 
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correlates with a function of the form Hbca fsixB)- One can pursue this idea to even- 
tually obtain the hypergraph version of the Szemeredi regularity lemma, a task which 
was carried out fully in [H]. 



In our applications we will need to generalise flB.7p to the case where the fs are bounded 
by some other functions ub- Fortunately this is also an easy consequence of Corollary 

El 

Corollary B.4 (Weighted generalised von Neumann theorem). Let {Xa)a&A be a finite 
collection of finite non-empty sets. For every B ^ A let Jb '■ Xb — * C and vb '■ Xb 
M"*" he functions such that ^ i^b{xb) for all xb G Xb- Then 

|E..,eX. n fB{xB)\ ^ \\fA\\uHu;X.) n Pb\\]S^Z ^^'^^ 
BCA BCA 

where for any B <Z A and qb '■ Xb C we define \\gB\\nB(i,;XB) be the unique 
nonnegative real number satisfying 

ii^Bipn:U)-E.<«).«ex.( n ci-i.Mxsr^^)) n n --(--^^ 

i^Be{o,i}B ccBcjce{o,i}(^' 

Remark. It follows from ( IB. 101) below that the right-hand side of the last equation is 
non-negative, and so ||fi'_B||n(,/;Xfl) is well-defined. Note for instance that 

CCB uJc&{0,l}C 

and 

||/B||n(l;XB) = ll/ij||n(Xs)- 



Proof. By a limiting argument we may assume that the ub are strictly positive through- 
out Xb- We refactorise 

BCA BCA 

where 

r I X fB^XB) -TT , i,2\A\-\C\ 
fB{XB) ■-= 7 T I I Jyc{XB) ' 

vb\xb) 



CCB 

Applying Corollary IB. 31 we can thus bound the left-hand side of ( IB.SP by 

II/aIIdcxa) n II •^^ II □(Xs) • 

BCA 

However, direct calculation shows that 

||/A||n{XA) = \\fA\\u(u;XA)^ (B.IO) 

whilst the pointwise bound 

CCB 
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together with fIB.Sp gives 



^\A\-\B\ 

B 



1/21-41-1^31 



n ^c{xb) 



CCB 



I ||l/2l^l-l^l 
\^b\\u{u-Xb) 



\u[Xb) 



and the claim follows. 



□ 



Remark. In order for this inequality to be useful, one needs to compare the weighted 
□ norm ||/||n(;y;XA) with the unweighted norm ||/||n(XA)- fixed set of weights 

z/, this is not possible when the v are unbounded; however, if the v also depend on an 
additional parameter y, then we will be able to establish comparability estimates of this 
type after averaging in assuming that v obeys suitable "linear forms conditions" . See 
Appendix [Cl similar ideas appear in [211 SS] ■ 



Now we pass from this abstract setting to a more "additive" setting. Given any s ^ 
0, any finite additive group Z and any function / : Z ^ C, we define the Gowers 
uniformity norm \\f\\u=+'^{z) by the formula 

\\f\\u-+^{z) ■= Wfixi + ... + a;s+i)||n=+i(Z''+i)- 

Equivalently, we have 

ii/ii?;"A(z) = E.(o),.(i).z=+i n c\-\fiY,^f'^) 

s+1 

iue{o,iy+^ j=i 

Because the U^~^^{Z) norm is derived from the box norm of dimension s -|- 1, many 
properties of the latter norm automatically descend to the former norm. For instance, 
the U^~^^{Z) norm is indeed a norm for s ^ 1, and from (IB. 41) we have the invariance 

\\e{<P)f\\us+Hz) = WfWus+^iZ) (B.ll) 

whenever s ^ 1 and : Z ^ M/Z is an affine-linear phase or more generally a polyno- 
mial phase of degree at most s. In our applications we shall take Z to be a cyclic group 
Zjv', and our functions / shall usually be real- valued. Also, from Lemma [B.2I we have 
the Gowers-Cauchy-Schwarz inequality for Z, which was first observed in [21] and reads 
as follows: 

s+1 

<^G{0,1}''+1 j=i a;e{0,l}=+l 



For technical reasons we shall need to localise the Gowers norms slightly. Let A be any 
finite non-empty subset of an additive group Z, which may or may not be finite. Then 
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for any f : A —>■ <C, we define the Gowers uniformity norm ||/||;7s+i(a) by the formula 



s+1 



2'+' -117 TT r-kl f('\^ 



'^l^ (B.13) 



In the particular case A = [N] , which is used several times in the paper, we shall adopt 
the abbreviation 



If A is contained in a finite additive group Z, then these local Gowers norms are related 
to their global counterparts by the identity 

||/||c/''+i{A) = ||/lA||c/»+i(^)/l|l^llt^=+H^) (B-14) 
for any f : A ^ C, where JIa : ^ ^ C is the extension by zero of / from A to Z. 
The local norm U'^'^^{A) is also intrinsic in the following sense: if A ^ Z, A' <0 Z', and 
: ^4 — >■ A' is a Freiman isomorphism in the sense that it is 1-1 onto its image and for 
any ai, 02, 03, 04 G A, we have 01+02 = 03+04 if and only if 0(oi)+0(o2) = 0(o3)+0(o4), 
then we have ||/ o 0||(7s+i(a) = ||/||;7=+i(A') for all / : A' — > C. A particular consequence 
of this is the following lemma. 

Lemma B.5 (Comparability of U'^^^{I) and ?7'^^^(Z7V'))- Let N' ^ 1 be an integer, 
let a > 0, and let I = {a, o + 1, . . . , 6} be an interval of integers whose length satisfies 
aN' ^ |/| ^ N'/2. Let f : I ^ C be a function on I, and let f : Z^i ^ C be the 
function formed from f by identifying I with a subset of and setting f{x) =0 for 
X ^ I . Then we have 

\\f\\us^^iz,,) = c\\f\\us+^(i) (B.15) 
where c = cj^n/^s > is a constant which is independent of f , and which is bounded 
above and below by quantities depending only on a and s. 



Proof. As |/| ^ N'/2, the interval / C Z is Freiman isomorphic to its counterpart in 
Z jv' • The claim then follows from (lB.14p together the easily confirmed observation that 
||l/||c/=+i(z^/) is bounded above and below by quantities depending only on a and s. □ 



Remark. We will typically apply this lemma with I = [N] and with A^' comparable to 
a moderately large multiple of A^. See, for example, the proof of Proposition 110.11 



Appendix C. Proof of the generalised von Neumann theorem 



The purpose of this appendix is to prove Proposition 17.11 

Proposition 17.11 (Generalised von Neumann theorem). Let s, t, d, L be positive integer 
parameters as usual. Then there are constants Ci and D, depending on s,t,d and L, 
such that the following is true. Let C , Ci ^ C ^ Os^t,d,L{^), be arbitrary and suppose 
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that N' e [CN, 2CN] is a prime. Let v : Z^v' he a D -pseudorandom measure, and 

suppose that fi, ■ ■ ■ , ft '■ [N] — >• M are functions with \fi{x)\ ^ ^{x) for all i G [t] and 
X e [N]. Suppose that \l/ = {ipi, . . . ,tpt) is a system of affine-linear forms in s-normal 
form with \\-^\\n < L. Let K C [-N^Nf he a convex body such that C [iV]*. 

Suppose also that 

min ||/i||i/=+i[Ar] ^ 6 

for some 6 > 0. Then we have 

Recall that this is a variant of [23', Proposition 5.3], which was proven by a long series 
of applications of the Cauchy-Schwarz inequality. We shall phrase our argument using 
Corollary IB. 31 but the argument is essentially that of [24, §5]. It is also necessary to 
perform some regularisation to deal with the convex body K, a technical feature not 
present in [2H Proposition 5.3]. 

Mgving tg a cyclic GRGUP. Let us first make some very minor reductions. We 
start by moving the whole problem to the group Z^r'. We will always assume that 
A^' = Os,t.d.L{N), but one may wish to take N' to be quite a bit larger than in 
order that a pseudorandom measure u can be constructed so as to make Proposition 
17.11 applicable. We embed [A^] inside Zjv' in the usual manner, and extend the functions 
f I, ft to all of Zat/ by defining them to be zero outside of [N]. From Lemma [RSl we 
then have 

\\fj\\u^+HI.r~f') S 

for some j G {1, . . . , t}. Similarly, we may identify the set K dZ'^ with a subset K' of 
Z^,. We can also view as a map from Z^, to Z^,. Note that \Ef will then map K' to 
[NY. To summarise, we have reduced matters to establishing the following. 

Proposition 17.11 (Transfer to I^n')- Let s,t,d,L be positive integer parameters as 
usual. Then there is a constant D, depending on s,t,d and L, such that the following 
is true. Let v : Z^f/ M"*" he a D -pseudorandom measure, and suppose that fi, . . . , ft : 
Zat' M are functions with \fi{x)\ ^ z^(x) for all i G [t] and x G Za?/. Suppose that 
= {ipi, . . . , ipt) is a system of affine-linear forms in s-normal form with H^E^IIat ^ L. 
Let K' C Z^, be identified with K for some convex K C [—\N', ^A^']*^. Suppose 
also that 

mmj|/j||c7.+i(z^,) ^ 5 

for some 5 > 0. Then we have 

^nez%M'{n)\[f,{i,M))= os{1) + k{5). (C.2) 

Remark. Note the disappearance of C. This was an artefact of the relationship between 
A^ and A^', which has now been forgotten. 

From this point onwards we do our linear algebra over Z^r/, rather than over Q. Note 
that the notion of s-normal form coincides in the two settings provided that A^' ^ 
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No{s,t,d, L) is sufficiently large. Furthermore no two of the homogeneous parts ipi are 
parallel when considered (mod N'). This fact (which is very easily checked) is a simple 
instance of a kind of "Lefschetz principle". 

Removing the convex cutoff. The next step is to partially eliminate the cutoff 
lx'{n) by replacing it by a more analytically tractable Lipschitz cutoff. We introduce 
a metric on Z^, by declaring the distance between (ni, . . . , n^) and (mi, . . . , m^) to be 
(l^j=i II "'jv'"' IIr/z)"^^^' where ||a:||M/z denotes the distance to the nearest integer. This 
is the metric induced from the standard embedding of Z^, into the torus (M/Z)^. To 
establish Proposition 17.11 . we claim that it suffices to establish the bound 

Enez^ F(n) J] M^n)) = os,m{^) + «^a/(<5) (C.3) 
i&[t] 

whenever M > 0, F : Z^, [—1)1] has Lipschitz constant M and the functions fi are 
bounded pointwise by u and satisfy mini^j^j ||/j||c/''+i(Zjv') ^ ^- ^^J: let £ > be 

a small quantity to be chosen later. It will suffice to prove that 

iG[t] 

as the claim then follows by setting e to be a sufficiently slowly decaying function of S. 

To establish this bound, we apply Corollary IA.3I to effect the decomposition 

lKin)=F,{n)+OiG,{n)) 

for all n G Z^,, where F^^G^ '■ Z^, — > [0,1] are Lipschitz in the above metric with 
constant 0{l/e). Furthermore, from the Lipschitz and integral bounds in Corollary lA. 31 
we easily obtain the estimate 

E„ez,,G,(n) = 0,(1) + /€(£). (C.4) 

Here we are basically using nothing more than the standard fact that Lipschitz functions 
are uniformly Riemann integrable. From flC.3p we have 

ie[t] 

and so by the triangle inequality and the fact that |/j(a;)| ^ v{x) it is enough to show 
that 

E„g^.^ G,(n) n umn)) = o.(l) + K{e). (C.5) 

Now a standard application of the linear forms condition (see [2^ Lemma 5.2]) gives 

h - l||c/-+Mz^') = 

Now the function |(i^ — 1) satisfies ||i^(a;) — 1| ^ \{^{.^) + 1); ^'^^ this latter function is 
easily seen to be a pseudorandom measure (see [211 Lemma 3.4]). Thus from (IC.3P we 
have 

^n&%Ge{n)^gi{'4)i{n)) = 0,(1) 

ie[t] 
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whenever all the functions gi are either 1 or z/ — 1, and not all of them are 1. When 
gi = 1 for all i we have the bound Oe(l) + fi;(e), from flC.4p . The bound fIC.Sp now follows 
immediately upon writing u = 1 + [u — 1) and expanding as a sum of 2* terms. 

It remains to prove (1C.3I) . We now claim that we may dispense with the Lipschitz cutoff 
F entirely, and reduce to proving the estimate 

n M^ii^)) = osil) + ^{S), (C.6) 

ie[t] 

which involves no cutoff function at all. To see this, first observe that ( 1C.6I) implies the 
extension 

E.ez^,e(m ■ n/N) J] /^(^.(^)) = ^^(l) + (C.7) 

ie[t] 

for any frequency m G Z^,. Indeed, if m lies in the span of ipi, . . . ,ipt then we may 
simply factor e(m ■ n/N) into terms that can be absorbed into the fi, . . . , ft factors, 
noting that we can trivially extend (1C.6I) to cover the case when fi, . . . , ft are complex- 
valued instead of real-valued. If m does not lie in this span, then it is easy to see that 
the left-hand side of (IC.7P in fact vanishes. 

Now we return to (IC.Sp . Let X > be arbitrary. By a standard Fourier- analytic 
argument, given in detail in \27l Lemma A. 9], we may decompose 

J 

F{n) = J^cjeirrij ■ n/N) + Od{M logX/X) 
i=i 

where J = Od{X'^), cj = 0(1) are coefficients, and rrij G Z^, are frequencies. Inserting 
this into flC.3l). we have 



E^^^.^F{n)l[f,{ij,{n)) 

ie[t] 

= Yl ^j^n&%ei™j ■ n/N) Yl fii^l^iin)) + Od{ — — )^n&1^, H '^(^i(^))- 

Using (1C.7P to control the first term and the linear forms condition to estimate the 
second, we see that this is bounded by 

Od{X'){os{l) + k{6)) + 0,(^^^^) (1 + 0(1)). 
Taking X to be a sufficiently slowly growing function of 1/5 we obtain ( IC.SP as desired. 

Main argument. It remains to prove ( lC.6p . By symmetry we may assume that /i is 
the function with minimal U'^^^ norm, thus 

Recall that the system \E' : — Z* is in s-normal form. By permuting the basis vectors 
ei, . . . , Crf if necessary, we may then assume that 11^=1 V'j(^i) vanishes for i 7^ 1 and is 
non-zero for i = 1. 
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In summary, we are reduced to proving 

Proposition [TTTJ' (Reduced generalised von Neumann theorem). Let s,t,d,L be posi- 
tive integer parameters as usual. Then there is a constant D, depending on s,t,d and 
L, such that the following is true. Let u : Zjy ^ M"*" be a D -pseudorandom measure, 
and suppose that fi, . . . , ft : Z^v' — * K are functions with \fi{x)\ ^ p^x) for all i G [t] 
and X G Z^/. Suppose that \E' = [ipi, . . . , tpt) is a system of affine-linear forms such that 
11^=1 i^ii^j) vanishes for i 1 and is non-zero for i = 1. Then we have 

K^K' n ^ WfiWu^^H^^,) + o{l). (C.8) 

To prove the estimate (IC.Sp . note first that the coefficients tpi{ej), j G [s + 1], are 
non-zero and bounded by Os^t,d,L{^), and hence are invertible in Z^f provided that ^ 
N(){s, t, d, L). Thus we may dilat^ the first s + 1 variables and assume that iji{ej) = 1 
for j G [s + 1], a manoeuvre which affords a little notational simplicity if nothing more. 
With this normalisation we have, writing n = (xi, . . . , Xd) and y = {xs+2, ■ ■ ■ , Xd), that 

^pi{xi, Xs+i, y) = xi + . . . + Xs+i + ^i(0, y). 

The other forms 2 = 2, . . . , t do not involve all of the variables xi, . . . , Xg+i, since the 
system \& is in normal form. This will be a crucial fact for us and to handle it we look, 
for each ipi, at the set fl{i) of indices j G [s + 1] for which ipi{ej) ^ 0, and then group 
the forms according to their associated set VL{i). Thus = [s + 1] and VL{i) C [s + 1] 
for i = 2, . . . ,t. Observe that the indices j = s -\- 2, . . . ,d and the associated variable 
y = {xs+2, ■ ■ ■ ,Xd) will be largely irrelevant in the sequel. With this nomenclature we 
may write the left-hand side of (1C.8I) as 

K.z%r-^\.,,ez^-^ n FbA^b)] (C.9) 

BC[s+l] 

where X[s+i] = {xj)j^[s+i], xb is the restriction of X[s+i] to B, and 

FB,y{xB) := Yl fiiMxB,y))- 

ie[t]:n{i)=B 

We have abused notation ever so slightly by regarding fi as a function on Z^, x Z^'*^^ 
rather than on Z^^ x Z^,, supressing mention of the irrelevant variables xj, j G [s -|- 
1] \ n{i). Observe that 

F[s+i],y{x[s+i]) = fi{'ipi{x[s+i],y)) = fi{xi + . . . + Xs+i+ipi{0,y)). 

Now we have the pointwise bounds \FB,y{xB)\ ^ ^B,y{.XB), where 

T^B,y{,XB) := JJ V^llJi^XB.y))- 
i€[t]:n(i)=B 



This dilation converts the coefficients from bounded integers, to rationals with bounded numerator 
and denominator. However, when the time comes to apply the linear forms condition, one can clear 
denominators and reduce back to estimates involving only bounded integers again. 
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Invoking Corollary IB.4[ we may bound fIC.Qp by 



TT7 II I? II TT II ||l/2'' + ^-l^l 

BC[s+l] 

The reader may wish to recall the definition of the quantities appearing here, which are 
provided in the statement of Corollary IB .41 

Applying Holder's inequahtjO, we see that to show fIC.Sp it suffices to show that 
and that 



y^Z^-rA\^BAtZcB,y) = 1 + (C.ll) 

for all non-empty B C [s + 1]. Note that except for /i, the unknown functions f2, . . . , ft 
have all been eliminated. This procedure will be familiar to readers who have looked at 
(for example) [21, Ch. 5]. 



We begin with (IC.lip . We expand the left-hand side, obtaining 



CCBa;ce{0,l}'^ 

=^4^4'.-. n n n -iu-'c^^y))- 



CCBaJce{0,l}C ig[t]:Q{i)=C 

Because of the definition of and the hypothesis that no two of the ipi were affine- 
linear combinations of each other, we see that the affine-linear forms 

as C varies over subsets of B and i varies over those i G [t] such that fl{i) = C, also 
have the property that no two forms are affine-linear combinations of each other. In 
other words, this system has finite complexity. Thus (IC.lip will follow from the linear 
forms condition fl6.2l) provided that the degree D of pseudorandomness is sufficiently 
large. 



Now we turn to (IC.lOp . The left-hand side expands as 



s+l 

a;6{0,l}'+l j = l 



X n n n ^mxr\y))- 

CC[s+l] a;ce{0,l}<^ jeM:n(j)=c 



This is really an application of the Cauchy-Schwarz inequality several times, since the exponent is 
a power of two. 
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Substituting h := x^^^^^ — a^l^^^] and z := x^i^ + . . . + xf^i + y), we may rewrite 
this as 

s+l 

^-\slv>^^K''y^Kr-' H + 111111 ^{U^c\y)+ 

ui£{0,iy+^ j=l CC[s+l]u>c€{0,l}(^ ie[t]:n{i)=c 

Observe that for fixed h, the map i— > z is uniform, in the sense that each z is 

mapped to by exactly (A^')'^~^ preimages. Thus we may rewrite the preceding expression 
as 



s+l 

a;G{0,l}<'+i i=l 

where 

:=E^.(o|^g2H^i.^g^^_,_i n n n '^(v^^(3^c^l/)+xl^^■^*'^^J)^J■)• 

,!l^(o)+.!+,(o) Ji,^(o,,) cc[s+i] .ce{o,i}^ im--m=c j€C 
Comparing this with fIB.lSp . we see that to prove fIC.lOp it suffices to show that 

s+l 

E.ez,,;,ez^V(W^(^, h)-l) n + E ^^•^^•) = ^(1)- 

a;6{0,l}=+i i=l 

By Cauchy-Schwarz and the hypothesis |/i(a;)| ^ i^{x), it suffices to estabhsh the esti- 
mates 

s+l 

a;e{0,l}=+i j=l 

for n = and n = 2. Expanding, we reduce to showing that 

s+l 

^,^^^,,^^^s^+.w{z,hr n K^+E^^-^^-) = 1+^(1) 

aje{o,i}''+i i=i 

for = 0,1,2. 

This will follow from the linear forms condition. We shall just verify the case n = 2, as 
the cases n = 0, 1 follow from that case (they utilise a subset of the linear forms that 
are used in the n = 2 case). When n = 2, we can expand out the left-hand side as 

s+l 

JJ v{z + ^ujjhj))~K 

a;e{0,l}''+l i=l 

n n n ^{^ii^c\y)+^^3'^ii(^j)hj)j^{A{x''^^ 

cc[s+i] <^ee{o,i}c jeM:n(j)=c j^c jec 

(C.12) 

where the average is over all sextuples 

(z, /i, xL^-^l , ^-^1 , G Z^v' X Z^"*" X Z^"^ X Z^""" X Z^,'^ X Z^,'' 
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subject to the affine constraints 

z = xf^ + ...+ xfl + V^i(0, = ) + . . . + + V;i(0, y). (C.13) 



Naturally, we wish to apply the linear forms condition, on the assumption that v is D- 
pseudorandom for sufficiently large D. To do this we must first eliminate the constraints 
( 1C.13I) . To do this, we substitute for xf}_i and xf^^ in terms of the other variables, that 
is to say we write 



and 



x';i = z-xf 4o)_^^(o^^) 



xi2i = ^-4°^ -^i(0,y). 



In this way we may rewrite ( ]C.12I) as an unconstrained average over the 2d + s — 1 



variables /i, , , y, y. 

When written in terms of this set of variables, it is clear that all the linear forms in 
( ]C.12I) have integer coefficients which are bounded in terms of s, t, d and L. To apply 
the linear forms condition, all we must do is satisfy ourselves that no two of these forms 
are affinely dependent, that is to say no two of them have parallel homogeneous parts. 

To see this, first observe that the 2*"*"^ homogeneous forms z + XlSi ^j^j pairwise 



distinct, and that they are also different from any other form appearing in (1C.12P even 
after perfoi 
one of the 
of [s + l]). 



after performing the above substitutions, because the latter forms all involve at least 

.(0) ~(o) 

[s] ' ^[s] 



one of the variables from xf], xf] (here we are using the fact that C is a proper subset 



Now consider an affine form + cUj?/'i(ej)/ij appearing in ( 1C.12I) . Recalling 

that C = the set of all j for which ipi{ej) ^ 0, we see that in our new system of 
variables this form may be written as the slightly alarming expression 



^iitj^x^f +ujjhj) +ipi{es+i){z - xf^ xf^ -ipi{0,y) +Us+ihs+i) +ipi{0,y). 

i=i 

(C.14) 

There is a similar expression involving tildes. We claim first of all that at least one of 
the variables xf'\ . . . must appear with non-zero coefficient. If this were not the 
case then we would have ipi^Cj) = tpi^eg^i) for j ^ s and hence, since C C [s + 1], C is 
empty. Hence so is the product over uc G {0, 1}'" in ( ]C.12I) . Thus no form ( ]C.14I) with 
this property appears in (lC.12p . thereby confirming the claim. 



The claim just proved immediately implies that no form flC.14p has homogeneous part 



parallel to that of a form with tildes. It remains to prove that the forms in flC.14p have 
pairwise non-parallel homogeneous parts. 

Suppose that we are given a form flC.14p written as 

qirixf'^ + ■ ■ ■ + rgX^^^ + r'z + (terms involving h and y)) 
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where g 7^ 0. We claim that the set C from which the form came may be identified 
knowing only ri, . . . , r^, r' . Indeed we must have qr' = tpi{es+i), whence ipi{ej) = X{rj + 
r') for j ^ s. The set C may be found simply by looking at which of these quantities 
do not vanish. It is immediately clear that uc G {0, 1}*" may also be recovered. 

The only way in which two forms (1C.14I) could have parallel homogeneous parts, then, 
is if there is some fixed choice of u, some i ^ i' and some rational q,q' such that 

s s 

s s 

for all choices of the variables. After some simple manipulations one confirms that 
qipi{ej) = q'lpii^ej) for j ^ s + 1 and that qipii^.y) = q'ipii{Q,y). Thus ipi is parallel to 
ipii, contrary to the assumption that the system \E' = [ipijl^-^ has finite complexity. 

We have verified that it was valid to invoke the linear forms condition, provided that 
D is large enough. This completes the proof of Proposition I7.1f and hence that of 
Proposition I7.1[ □ 



Appendix D. Goldston-Yildirim correlation estimates 



One aim of this section is to construct a pseudorandom measure v such that a suitable 
multiple of v majorises the modified von Mangoldt function h.^^/. Specifically, we will 
prove Proposition 16.41 This was essentially carried out in |211 Chs. 9, 10], building on 
work of Goldston and Yildirim fTHl [IZl dHl IS], but the argument there only led to a 
majorant for one function A^ ^, whereas in the present work we need to simultaneously 
majorise AJ,^ wi ■ ■ ■ i^'h w- ^ small modifications to the argument in [21] would, 
however, achieve this. Another aim of this section is to prove f ll2.5l) . a crucial estimate 
on the Cowers norm of a certain truncated von Mangoldt function A^. This does not 
follow immediately from the results in [24], though can be proved using similar ideas. 
We take the opportunity to give a brief but more-or-less self-contained account of these 
ideas here, while also providing some simplifications. 

The heart of the matter is the establishment of correlation estimates for truncated 
divisor sums A-^ jj ^ : Z — M of the form 

In this expression i? is a moderately large number, which in practice will be a small 
power of A^, X • IK ^ is a smooth, compactly supported function, and a G N. In 
our applications we only ever take a = 1 or a = 2. We extend k.^,R,a to the negative 
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numbers in the obvious manner. Indeed, the compact support of x ensures that ^ 
is periodic. 

Remark. Observe that A^ /j ^ = x(0)'*logi? on "almost primes" - numbers coprime to 
rip^H^- purposes of gaining intuition about these functions one might think 

of them heuristically as being weights on the almost primes, though they do also have 
some weight on other numbers. The reason we need to deal with A^ /j_2(^) is to correct 
for the rather unfortunate fact that A^^/j_i(n) can be negative. This trick is of course 
closely related to the A^ sieve of Selberg. 



Associated to these truncated divisor sums are certain numbers which we call sieve 
factors. 

Definition D.l (Sieve factors). Let x : M ^ M be compactly supported and suppose 
that a ^ 1. Then we define the sieve factor c^^a by the formula 

where is the modified Fourier transform of Xi defined by the formula 

/•oo 

e'xix) = / ^(Oe-"« (D.2) 



The sieve factor ^ looks very complicated (though exphcitly computable), but in the 
special cases a = 1, 2 it has a particularly simple form: 

Lemma D.2. We have c^^i = — x'(0) and c^^2 = |x'(2^)P dx. More generally, c^^a 
is a real number. 



Proof. We deal first with the case a = 1. From (ID. II) and ( ID. 21) we have 



Cy,1 



/(I + tO^iO d^ = X(0) - ^ie^xix))U=o 



and the claim follows. Now we handle the case a = 2. We have 



Using the identity 

1 



2 + ^(e + e) 

we can rewrite c^^2 as 

^(0(l + ^Oe-^'+^«^^rfel dx. 



But from differentiating ( ID. 21) we see that the expression in parentheses is — x'(x), and 
the claim follows. 



Finally, for general a, we observe that since x is real, we have </?(— = ^^(0- Taking 
complex conjugates of fID.ip and substituting i-^ — we obtain the claim. □ 
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Roughly speaking, we will be able to show the analogue of the generalised Hardy- 
Littlewood conjecture for these sums A^,R,a so long as x is suitably smooth and R is a. 
sufficiently small power of A^. More precisely, we prove the following. 

Theorem D.3 (Goldston-Yildirim estimate). Let t,d,L be positive integers, let N be 
a large positive integer as usual, and let \I' = {ipi, . . . ,ipt) he a system of affine-linear 
forms with \\'^\\n ^ L. Assume that no two of the forms ipi are rational multiples of 
one another. Let a = (ai, . . . , a^) G N* he a t-tuple of integers. Let K C [— iV, be a 
convex body, and let xi, ■ ■ ■ , Xt '■ ^ he smooth, compactly supported functions. Let 
R = N'^ , where 7 > ^s sufficiently small depending on t, d, L, x and a. Call a prime p 
exceptional if there exist two forms which are linearly dependent modulo p, and 

let Pisi denote the set of all exceptional primes. Write X := Ylpep^ Then we have 

E n K,R,aXUn)) = n ^x.,a. ■ vol,(K) + 0(-^y^e«W), (D.3) 

neKnZ'ii€[t] i<=[t] p 

where the local factors (3p for each prime p were defined in (11.41) . (11.61) . and the sieve 
factors c^^a were defined in Definition W.R The implied constants here can depend on 
t,d,L,xi, ■■■,Xt and a. 

Remarks. Note that we are not assuming that the system \l/ has finite complexity but, 
as stated, we do assume that no two of the forms ipt are rational multiples of one 
another. This means that is finite but not necessarily bounded in terms of t, d, L. 
If, for example, we have d = 1, t = 2 and ipiln) = n, ip2{n) = n + M, then P$ can be 
somewhat large if M has many prime factors. If does have finite complexity then X 
is bounded in terms of t,d,L and the error term becomes o[N'^). In other situations 
this term can be more substantial. We have not attempted to find an error term which 
is best possible, being happy to settle for one that suffices for our apphcation, and in 
particular for the correlation condition (Definition 16.31) . 



Theorem ID. 31 should be compared with Conjecture 11.21 The space \E'~^((M"'")*), which 
appears in (11.41) . is not present here because the truncated divisor sums A-^^ extend 
periodically to the negative numbers, in contrast to the von Mangoldt function A. 

Remark. In the works of Goldston, Pintz and Yildirim [161 El [IHl [IHj the choice of 
cutoff X w^-s critically important. In our analysis it is not, ultimately because the 
inverse Gowers-norm conjecture GI(s) applies even for arbitrarily small S > 0. This 
allows us to use simpler and smoother enveloping sieves in which the sieve factors are 
large. We do, of course, require these factors to be independent of A^. In taking x to 
be very smooth, a number of simplifications are possible. Following notes of the second 
author [121 US] (see also [31]), we avoid the use of any deep facts from analytic number 
theory such as the classical zero-free region for the Riemann zeta function. One may 
instead make do with the elementary observation that the Riemann zeta function ({s) 
has the asymptotic ({s) = + 0(1) for s near 1 and 3fJ(s) > 1. We note that these 
simplifications could also be applied (retrospectively) to Chapters 9 and 10 of [23]. 

Remark. Observe that if i? = A^'^, then ^ A'{n) ^ ^^^Q^2 -^x,R,'i(^) alln, R < n ^ N. 
Thus we can use Theorem ID. 31 to obtain upper bounds for the expression (ll.7p which 

lose a multiplicative factor of ( -%7p- ) , which is independent of A^. This observation. 
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coupled with a good choice of x and 7, is rather close to the Selberg sieving technique. 
As is well-known there are significant barriers (the "parity problem") to reducing this 
multiplicative loss to something approaching 1. 

Prggf gf Theorem ID.31 To simplify the notation we allow all implicit constants 
to depend on t, ci, L, xi, . . . , xt and a. We may assume that (and hence R) are large 
with respect to these parameters, as the claim is trivial otherwise. 

It is convenient to introduce the index set 

n:={{i,3):ie[t]-3e[a,]}'Zfe. 
With this notation, it is a simple matter to expand the left-hand side of (ID. 30 as 




The jjL factors allow us to restrict mj to N*, the set of square- free natural numbers. 
If, for each i G [t], we set uii := lcm(mj^i, . . . ,mi^ai), then we can rewrite the above 
expression as 

Since x is compactly supported we may restrict rrii to be at most R^^^^ for all i. In 
particular if we set m := niG[t] then m ^ R^^^^ also. From the Chinese remainder 
theorem we see that as a function of n, the expression HiGit] '^m.i\ip^(n) is periodic with 
respect to the lattice m ■ l/'. By a volume packing argument similar to that used to 
prove fll.31) in Appendix [XJ we have 

E n ^^^IV-.W = vold(i^)a„^l,...,„^, + 0{mN'^-^) 
where ami,...,mt is the local factor 

ie[t] 

The total contribution of the error term 0{mN'^-^) to (iDlSj) can be estimated crudely by 
0{R'^^^^ N^~^ log* R), which will be o{N'^) if the exponent 7 that defines R is sufficiently 
small. Thus we can discard this term and reduce our task to that of showing that 




+ 0(e«Wlog-i/2°/?). (D.4) 

Note that we have eliminated the convex body K and the scale paramete A^. From the 
Chinese remainder theorem we make the key observation that ami,...,mt is multiplicative 

"'^^Observe that, although the o- notation concerns the situation when — + 00, this is exactly the 
same as letting i? — > 00. 
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in nil, • • • ; iTT't, so that if we decompose rrii = YlpP^'''' then 

«mi,..,mi = JJ^ «p>.i,...,p'>.t- (D-5) 
P 

Note that as the mjj are square-free, the Vp^i are either or 1. 

The next step is to use Fourier expansion to replace the weights Xi by more muhi- 
phcative functions. Indeed, as Xi is smooth and compactly supported we have the 
Fourier expansion flD.2p for some smooth ipi which is rapidly decreasing in the sense 
that <A (1 + 0"^ for all A>0. Thus we have 



'log ruij 



logR 



oo 



We could insert this Fourier expansion into (lD.4p directly, but it will be easier if we 
first take advantage of the rapid decrease of (fi to truncate the Fourier integral to the 
interval J := G M : |^| ^ log^''^ R} (say), thereby obtaining 

M^-T^) = I rrnPv^i^ de + O^K;/i°^«log-^i?) 



logi? 

for any A. Since Xi(logmjj/log-R) is itself bounded by 0{n\j^^°^^), we conclude that 



n ^^(iS') = /••• / n -■fv.tej^eM+o^fog'^^ n -^r^")^ 

(D.6) 

where we have written Zi^j := (1 + i^ij)/ log R. Let us first deal with the contribu- 
tion of the error term C'A(log"'^ i? j)Gf7 ^^°^^) to ( ID. 41) . Taking absolute values 
everywhere, we can bound these contributions by 



«A(l0gi?)0«-'^ J2 H ^ 



Using the multiplicativity, we can factorise this expression as an Euler product 
(logi?)«(l)-^ n Cipr,_^r,p-^^^.mnn.)n-^^ 

where rj := max(rj i, . . . ,'rj_a-). Crude computations then show that apn ^rt is equal 
to 1 when ri = . . . = = and 0(1 /p) otherwise (cf. Lemma [l.3p and hence we can 
bound the above expression by 



~0{1) 



residue 1, we see that 



pl+l/logi?' 

p 

pS 



Since the Riemann zeta function ((s) = Ylpi^ ~ ^) '^as a simple pole at s = 1 with 



V 



whenever 3fJ(s) > 1 and s — 1 is sufficiently close to 1. This allows us to bound the 
above expression by 

0^((logi?)«(i)-^) 
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which will be acceptable if A is large enough. Thus we only need to deal with the 
contribution of the main term of flD.6p to flD.4p . After swapping sums and integralfl, 
we write this term as 

log^R ... Yl K^i,j)m7,j^''(^rnu...,mt^ii^i,j) d^ij- 

Using the multiplicativity of a once more, we can write this expression as 

log* i? /".../" n ■ n (^-8) 

where ^ = {Cij)(i,j)en G and Ep^^ is the Euler factor 

(™ij)(ij)GnG{i,p}" (i,i)en 

Our task is to show that (fPlSll is ( Ureit] Cx»,aJ UpPv + 0(e^(^) log"^/^° R). To tackle 
this we must understand the Euler factors Ep^^. We may rewrite this expression as 

Bcn i' 

In this expression a{p,B) := ap'i , ^r^, where := 1 whenever G B for at least 

one j, and := otherwise. Note that 0) = 1. 

Call a set -B C i7 vertical if it is non-empty and contained inside a vertical fibre {«} x [oj] 
for some i G [if:]. If B is vertical then a{p,B) = Eng^dlplv-iC™)) which is equal to 1/p if 
P ^ Po{t,d,L) is sufficiently large. To say something about a{p,B) when i? is neither 
empty nor vertical, recall that we described a prime p as exceptional, and wrote p E P\f,, 
if there exist i, i' such that ipi is a multiple of tpii in Zp. For p ^ P-qi^we see from Lemma 
11.31 that a{p, B) = 0{l/p'^) whenever B is not vertical or empty. If p G then the 
best we can say in general is that a{p, B) = 0{l/p). 

From the above discussion we have 

Ep,5 = (l + 0(l/p2))E;^ forp^P^, (D.IO) 

where E'^ ^ is the Euler factoi@ 

_BCn,S vertical ^ ^ ' 



"'^'^One can justify the exchange of integrals and summations because / is compact, and the summation 
can be shown to be absolutely convergent, either by using the crude bounds above, or by using bounds 
such as (|D.10p below. 

^'^To provide a link to the discussion of [24j , we observe that 



ViK^i= n c(i+ E 

p BCO,_B vertical (i,j)eB 



(-1)1 
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For p E Pq, we must rely instead on the far weaker bound 

E,,^ = (1 + 0(1/p))E;,^. (D.12) 

From the estimate flD.7p and the fact that \zij\ = 0(log~^''^ R) when G / we have 



/ 1 X (-1)1^1 

n^;.= n iy — 

p BCn,_B vertical ^^(«,i)eB «J / 



= (l + 0(log-i/^i?)) n ( E ^^.:') • (D.13) 

_BCr2,_B vertical {i,j)&B 

Our aim now is to estabhsh a corresponding estimate for Hp-^p,?- Note that we cannot 
afford the loss of a multiplicative constant which would result from a naive apphcation 

of ^M- 

Proposition D.4 (Euler product estimate). We have 

n = f n + oie^'""' i-g"''" ^)) n 
p \ p / p 

for any ^ G J*^ . 

Proof. From Lemma [O] we have Pp = 1 + 0{l/p) for p E Pm and [3p = 1 + Oil/p^) 
otherwise. For starters this implies the very crude bound 

\{^^e<'^^\ (D.14) 

p 

which we will use later on. Our first main task is to dispose of the contribution of the 
large primes p, when (say) p > log^^^° R. Using the estimates for (]p just mentioned, we 
have 

n /?P^e^W. (D.15) 

p^logi/io R 

We also have 

n (3p^exp{0{ P''^) 
p>iogi/io/j p>iog^/^° R-.peP^ 

^exp (0(X log"^/^° R)) 
= l + O(e«Wlog-^/20^)^ 

where the last bound follows from the elementary inequality e^^ ^ 1 + Xe^ , valid for 
A ^ 1 and X G M^o- Similarly, using the inequality e~^^ ^ 1 — Xe^ , we obtain the 
corresponding lower bound, and thus 

n /5p = l + 0(e«(^)log-^/^°i?). (D.16) 

p>logl/10^ 

From this and (1D.14P we see that it will suffice to show that 



■n \ ^, 1 /in / n 



>s:iogi/"'ii 
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Now from fiDJOl) . ^DJ2^ we have 



Since E,>iogVio^p-2 ^ o(log^i/i° i?) and EpeP.:p>W/- i^P"' = 0{X log-'/'' R), we 
conclude that 

n i?p,^ = exp(O(l + X)log-i/20^) JJ ^/^^ 
= (l + 0(e^Wlog-i/2°/?)) n ^^C' 

p>logi/io 

the last step following as in the proof of (1D.16p . From this and (1D.14P we see that it 
suffices to show that 

n E,,=( n /?p+o(e«(^)iog-/2°/?)) n k,,- (d-i7) 

To do this, we will prove the following lemma. 
Lemma D.5. We have 



for allp^ logi/i°/2. 



Proof that Lemma \D.5\ implies (1D.17I) . Suppose first that there is po ^ logi/i° R such 
that = 0. Then, using the fact that Pp = 1 + 0{l/p), we have 

which is acceptable. If no Pp is vanishes then, since Pp = 1 + 0(l/p) and Pp is a rational 
with denominator dividing p'^, we have a bound Pp ^ 1 with the implied constant 
depending only on the global parameters t, d, L. Thus, using ( ID.lSp . we have 



n n n (^-«(;^)) 

= ( n /5p)-(l + 0(l0g-l/^i?)) 

= n /5p + 0(e^Wlog-i/^/?)). 

Thus f lD.lTp holds in this case also. □ 



Proof of Lemma \D.5[ Observe that since ^ G , we have 

pE(,,,)efl^»j = 1 + 0(logp/logi/2i?) 
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for all B and all p ^ log^/^° R. Dividing flDlOD by (IDTTD (noting that the latter has 
magnitude comparable to 1) and performing Taylor expansion in w = p^^^'j about 
w = 1 it is not hard to check that 

where Ep, E'^ are defined setting all the Zi^j equal to zero in fID.Qp and fID.lip respectively. 
Thus 

E,:=^(-l)l^la(p,i?) (D.18) 

B<zn 

and 

_BCn,B vertical ^ 

To prove the lemma, then, it suffices to prove the identity 

Pp = 5. (D.20) 

Recalling (ID.ISP and (]D.19|) . it will suffice to show that 

J2i-iraip,B)=Pp n (D.21) 

BCn BCn,_B vertical ^ 

Using the binomial theorem, the right-hand side of (lD.2ip simplifies to (3p{l — ^y, which 
by (11.60 is equal to 

By the inclusion-exclusion principle this can be written as 

ri,..,rtG{0,l} n=0 

which in turn is just 

ri,...,rte{0,l} 

We are to show that this is equal to the left-hand side of ( lD.2ip . namely 

^(-l)l^la(p,5). 

Bcn 

To do this, we compare coefficients of ap'-i,...,^^* on both sides. To evaluate the coefficient 
on the left-hand side, let / be the set of indices for which rj 7^ 0. Then this coefficient 
is easily seen to be 

n E (-i)'"-' 

which, by the binomial theorem, is simply (—1)1^1 This gives (ID.20p . and the claim 
follows. □ 



We return to the proof of flD.4p . Recall that we had reduced this to the task of finding 
an approproate asymptotic for fID.Sp . Substituting the result of Proposition ID. 41 into 
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fID.Sp and applying flD.14p . it is easy to reduce this in turn to showing the following two 
facts. Firstly, that 



log* R [■■■ fm K,d n ^^(^^.^o = n ^x.^, + o(iog-^/2° r)- (D.22) 

and secondly that 

log*/? [■■■ [u \K,i\ n = ^(1)- 

Let us begin with the second task, that of proving ( ID. 231) . We simply substitute Zij = 
(1 + i^ij) I log R into (ID. 131) . The contribution from the terms log R is precisely log~* /?, 
by a simple application of the binomial theorem ^bccb7^0(~-'-)'^' ~ O'*"' ~ -^^^ ^'^^ 
terms involving the we have the crude estimate 

n (i+i^M-ir^ 

and so 

However since x is smooth its modified Fourier transform satisfies \<fi{C,i,j)\ (1 + 
l^jjl)""^ for any A > 0, as we have already remarked. The claim then follows by taking 
A large enough. 

Now we prove (lD.22p . Using the rapid decay of the functions (p once more together with 
(1D.13P we see that it suffices to show that 

log*/? [... f n ( E ^m)^"'^""" n ^^(^m) d^^, 

J I "'^ BCQ,B vertical {i,j)&B (ij)Gf^ 

= n^X.a.+0(l0g-^/^°/?). 

ie[t] 

The first move is to reinstate the integrals over all of M, rather than just over /. Doing 
this introduces an error which is <^a ^og~^ R for any A > 0, on account of the rapid 
decrease of (p. Once this is done the multiple integral is easily seen to factor, there 
being one integral for each index i. After scaling out the factors of log/?, the claim 
follows from the definition (ID.ip of the sieve weights c^^a- The result follows, and we 



have concluded the proof of Theorem ID. 31 □ 



CONSTRUCTIGN GF THE ENVELOPING SIEVE. Now we are ready to prove Proposition 
16. 4[ the statement of which was as follows. 

Proposition 16.41 (Domination by a pseudorandom measure) . Let D > 1 be arbitrary. 
Then there is a constant Cq := Cq{D) such that the following is true. Let C ^ Cq, 
and suppose that N' G [CN, 2CN] . Let hi,. . . ,bt G {0, 1, . . . , IV — 1} be coprime to 
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W := np<ti;P- Then there exists a D -pseudorandom measure v : Zjv' — ^ which 
obeys the pointwise bounds 

1 + Kwi"") + ■■■+ KA^) ^D,c ^{n) 
for all n G [A^^/^, A^], where we identify n with an element ofL^' 'in ihe obvious manner. 

The definition of D-pseudorandom was given, and discussed, in ^ See in particular 
Definitions 16.21 and 16.31 and the paragraphs following the latter. Let 7 = 7(C, D) > be 
a parameter to be chosen later and set i? := A^'''. Fix an arbitrary smooth even function 
X : M — i> M which is supported on [—1, 1] and satisfies x(0) = 1 and = 1. 

For such a function we have c^^2 = 1, thanks to Lemma [D.2[ 

We define the preliminary weight v : [A^] by setting 

z>(n) := E,eM^^^AxA2(W^^ + k) 

and then transfer this to Z^v' by setting v{n) '■= \ + when n G [A^] and u{n) := 1 

otherwise. 

By construction, z/ is certainly non-negative. To verify the pointwise bounds, it suffices 
to show that 

for all i G [t] and n G [A^'^''^ , A^] . The left-hand side is only non-zero when Wn + bi is 
a prime which is greater than N^^^ . Supposing that 7 < 3/5, we see that in this case 
the left-hand side is equal to log A^, while the right-hand side is log R. Since 
R = N"^ and 7 depends only on C,D, the claim follows. 

It remains to show that z/ is a D-pseudorandom measure. Our argument here shall 
follow that in pi] rather closely, but will use Theorem ID. 31 as a substitute for pH 
Propositions 9.5,9.6]. For that reason we shall skip some of the details which are more 
or less exact repetition of those in [21]. 



Let us first verify the {D, D, Z})-linear forms condition. By decomposing u up into its 
various components as in [21], it certainly suffices to establish the somewhat general 
bound 

where = [tpi, . . . , ipm) is a system of affine-linear forms, no two of which are affinely 
related, m,d, \\'^\\n are all Od(1), and K C [—N^N^ is a convex body with '^{K) C 
[— A^, N]"^. Splitting i) up further, we thus reduce to showing that 

E ll^x,RAUWn + h^))=roUK) + o{N') (D.24) 



for all ii, . . . ,im & [t]. 
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Now we apply Theorem ID.3I As we are assuming that no two of the forms ipj{n) are 
affinely related, the same is true for the forms ipjiWn + hi.). In particular we see that 
the exceptional primes, if they exist, are bounded in size by 0{w) = O (log log A^). In 
particular we have X = 0(loglog^''^ A^) and so e*^*^"^-* log"^"^^*^ i? = o(l). We can thus 
write the left-hand side of flD.24p as 

where we suppress the dependence of constants on t, m, d, L, D. Because all the hi. are 
coprime to we see that /?p = {-^Y for all p ^ w, and in particular Ilp^wi/^p = 

{jPW)) ' ^OT p > w we see from Lemma 11.31 that (3p = 1 + 0(l/p^), and so 

rip>io/^p = 1 + o(l). Since c^^2 = 1, the claim follows. 

Now we verify the D-correlation condition for u. As before we can pass from u to u, 
and reduce to showing that 

^ n '^(^ + ^^■) ^ ^ Yi - ^i') 

nel je[m] l^j<j%m 

for all m = 0/5(1), all hi,...,hm G [A^], and all intervals / C [A^], and where r : 
[-A^, A^] M+ obeys the moment bounds E„g[_Ar,Ar] r(n)^ -Cg 1 for all g > 0. We 
may assume that no two of the hi are equal as in this case one can use crude divisor 
estimates, setting r(0) to be moderately large (see [21] for details). Again, we split up 
z/ and reduce to showing that 

whenever ii, . . . G [t]. We can apply Theorem ID. 31 with the system of forms = 
(W{n + hj) + &ij)jLi and write the left-hand side as 

As before we can discard the sieve factor c^ 2 = 1, and we have np<u;/^p ~ \1^W)) ' 

It thus suffices to show that 

n/?p + e'^(^Mog-V20i?« J2 r{h,-h,,). 

p>w l^j<j'^m 



From Lemma [T73] we see that for p > w we have jSp = l + 0{l/p), with the improvement 
Pp = 1 + 0(l/p^) as long as p ^ P^, that is as long as p does not divide W{hj — hj') + 
hi^ — hi^, for any 1 ^ j < j' ^ m. Thus 

p>w p>w " ^ p>w " ' 
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On the other hand, since w = O (log log A^) is so small we have 

eO(x) i^g~i/2o ^ ^ 5^ 4^)) log-V^o R 



pl/2 



«exp (oiY,^))- 



'p>W 

peP^ 



It follows from this analysis that if we set 



r,n):= exp (o( -L.)) 



then we obtain the desired correlation estimate. To show the moment bounds on r it 
suffices to show that 



1/2 

for all h = 0{W). By repeating the proof of [231 Lemma 9.9] we can deduce this bound 
from 



Using the bound 



ne[N] P>w 

p\Wn+h 



p\ 

we reduce to showing that 



P>w {d,W)=l 
p\Wn+h d\Wn+h 



{d,W)=l nG[Af] 

But we have 

E l=0(l + iV/rf) 

nG[Af] 
d\Wn+h 

by the Chinese remainder theorem, and the claim then follows easily. This concludes 
the proof of Proposition 16. 4[ □ 



The correlation estimate for A". The final task of this appendix is to establish 
the correlation estimate (112.51) . which was the crucial fact that ^y — 1 has small Gowers 
norm. We allow all constants to depend on s. Expanding out the U^^^[N] norm, it 
suffices to show the slightly more general bound 

E n (^^KW{n + u.h) + h)-l) = o{N^^^) 

{n,h)<^K u)&{0,lY+^ 
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whenever K is a. convex body in [— A^, A^]''^^. Expanding out the product, it suffices to 
show that 

E n + uj-h) + b)= YoU^iK) + o(iV^+2) 

{n,h)eKojeB 

for all B C {0, 1}^+^ Now observe that = -A^t,R,i, and so we may invoke Theorem 
ID.3I with the system of forms \E' = {W{n + uj ■ h) + h)^^B to write the left-hand side as 

( - \{^ + 0(Ar-^e«W/logV-i?). 



p 

\B\ 



As in the preceding section, we compute that i^p — \ '^n) ) ' ^^ile /3p = 1 + 

0(l/p^) for p > w. Furthermore all exceptional primes p have p ^ w, and thus since w 
is so small 

eOW/ i^gi/20 ^ ^ o(log-V^° R) exp (0( J] -^2)) = o^)- 



1/2 



Finally, from Lemma [D. 2 1 we have c^tt 1 = —1- The claim follows. □ 



Appendix E. Nilmanifold constraints; Host-Kra cube grgups 

Our aim in this appendix is prove Proposition [TT31 which asserts a constraint concerning 
parallelepiped in nilmanifolds. It turns out to be convenient to generalise the notion of 
a parallelepiped to a more general object, namely a Host-Kra cube. Thus much of this 
appendix will be devoted to the algebraic theory of these cubes. We will ffist introduce 
such parallelepipeds in the Lie group G, establish the constraint there, and then descend 
to the quotient space G/T and show that the constraint persists down to the quotient. 
In preparing the material that follows we benefitted much from conversations with Sasha 
Leibman, and also from remarks made by one of the anonymous referees. 

Hgst-Kra cube grgups in G. Let G be a connected Lie group with identity idc, 
with the associated lower central series G, given by 

G = Go = Gi ^ G2 ^ • . . , 

where Go = Gi = G and Gj+i = [G, Gj]. We recall the standard facts that [Gj, Gj] C 
Gi+j, and that each Gj is a closed connected normal Lie subgroup of G; see for instance 
O Ch. 3, §9, Corollary to Prop. 4]. In particular the quotient groups Gj\G are also 
Lie groups. 

To define the Host-Kra cube group HK*^^(G,) we ffist need some combinatorial nota- 
tion. 

Definition E.l (Simple combinatorics of {0, 1}^^^). We refer to {0, 1}*+^ as the cube. 
Its elements uj may be partially ordered by decreeing that uj ^ uj' if uj ^ uj'j for 
j = A hyperplane is any set of the form Hja '■= {uj : Uj = a}. If 

^ d ^ s + 1 then we say that a face of codimension d is any non-empty intersection F 
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of d distinct hyperplanes, and we write d = codim(F). Thus any vertex in {0, 1}''"^^ is a 
face of codimension s + 1, whilst the whole cube {0, 1}''"'"^ is a face of codimension 0. We 
say that two faces are parallel if they have the same fixed coordinates, and hence the 
same codimension. Every face F has a minimal element min(F) and a maximal element 
max(F). We say that a fac^ is lower if min(F) = 0''"'"^. Note that every face is parallel 
to exactly one lower face, and that lower faces F are in one-to-one correspondence 
with their maximal elements max(F), which can be arbitrary. Finally, we say that two 
parallel faces are adjacent if their union is a face of one lower codimension. 

Definition E.2 (Face groups). Let F C {0, 1}'^+^ be an face of codimension d. For 
any element g ^ G, we write for the element of G^^'^^"^^ such that {g^)i^ = g when 
u E F, and {g^)uj = idc otherwise. The face group Tp is the group generated by all 
elements g^ with g e Gcodim{F), thus Tp = Gcodim{F)- 

Definition E.3 (Host-Kra cube group). The Host-Kra cube group HK*'''^(G,) is the 
subgroup of G^O'i}^'^^ generated by all the face groups Tp, as F ranges over faces of 
{0,lF+i. 

The Host-Kra cube group could be defined with a more general filtration in place of the 
lower central series G,, that is to say a sequence of subgroups in which the condition 
that Gj+i = [G,Gi] is relaxed to an inclusion [Gi,Gj] C Gt+j. We will not need this 
here. 

The significance of the group HK*^^(G,) for us is that it contains the parallelepipeds: 

Lemma E.4 (Parallelepipeds are Host-Kra cubes). Given any g,x E G and n,hi, . . . , 
hg+i in Z, the parallelepiped g := {g"'~^'^''^x)^^(z[o^iys+i lies in HK^"'""^(G,). 

Proof. We may write, in G^^'^^"^^ , 

g = (/=+^)^=+H/1^= • • • (/^f H^7'^a;)^^ 

where Fq := {0, 1}'^+^ and Fi is the hyperplane Fj := {cj : cUj = 1} for z = 1, . . . , s + 1. 
Thus g is the product of s + 2 of the generators of HK*^^(G',). □ 

The face groups Gp are related to each other in a pleasant way: 
Lemma E.5 (Face relations). Let F,F' be faces in {0, 1}''+^. 

(i) IfF,F' are disjoint, then the elements inTp andVpi commute with one another. 

(ii) If F and F' intersect then \rp,Vpi] C F^^nF'- 

(iii) If F and F' are adjacent and parallel, then Tp C TpiTpyjpi andVpi C F^F^u^/. 

Proof, (i) is immediate. To prove (ii), note that any element of [Fi?,F^] has the form 
rj.FnF some X G [Gd^Gd']-, where d := codim(F) and d' := codim(F'). The result 

^""^With respect to the partial ordering ^, a lower face is exactly the same concept as a principal 
filter. 
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follows on noting that codim(Fn-F') ^ codim(F) +codim(F'), and recalling from group 
theory that [G^, G^/] C Gd+d'- (hi) is immediate; in this setting we have x^x^ = x^^^ . 
□ 



From Lemma IE.5I (iii) and an easy induction on the codimension we see that every face 
group Tp lies in the group generated by the lower face groups. In particular this implies 
that the entire group HK*^^(G,) is generated by the lower face groups. The same result 
holds for the upper faces, but we will not have any further use of this here. 

Now we seek a more explicit description of HK''^^(G,) by the lower face groups. To 
achieve this, we need 

Definition E.6 (Decreasing ordering of faces). Let Fi > • ■ ■ > F2S+1 be any ordering 
of the 2'^+^ lower faces of {0, 1}*'*'^. We say that this ordering is decreasing if whenever 
Fi D Fj we have i ^ j. Thus Fi = {0, 1}"+^ and Fg.+i = 0"+^ 

Clearly, decreasing orders of faces exist; let us fix such an ordering. Now, observe 
from Lemma IE. 51 (i),(ii) that if i < j, then we either have Tpj ■ r^?. C ■ Tf, or 
Tp. ■ Tp. C Tp. ■ Tp. ■ Tpi, for some k > j. From these inclusions we see that any 
product of elements from the lower face groups Tp. can eventually be contained in 
Tp-^ ■ Tp^ ■ . . . ■ FiT^^^j, as one can use the above inclusions to move all occurrences of 
Fi?j to the far left, use the closure property F^^^ • F^^^ = F^^^ to concatenate, then move 
all occurences of Tp^ to be adjacent to Tp-^, and so forth. Since the lower face groups 
generate HK*'^^(G,), we have thus obtained the factorisation 

RK-'+\G.) = Tp,-Tp,-...-Tp^^^,. 

Thus there exist functions Tj : HK*~''^(G,) ^ Tp^ such that 

g = ri(g)...r2.+i(g) (E.l) 

for all g G RK'+\G,). 

Remark. Since Tp. = Gcodim(Fi) is a closed connected Lie subgroup of G^^'^^"^^ , we 
can conclude from the above factorisation that HK*"'"^(G,) is also a closed connected 
group Lie subgroup of G^^'^^" . Furthermore, since the hyperplane face groups consist 
entirely of parallelepipeds, and the lower dimensional face groups can be expressed 
as commutators of the hyperplane face groups, we see that HK*'''^(G,) is in fact the 
subgroup generated by the parallelepipeds. Thus this is an extremely natural group for 
studying parallelepipeds. 

In the factorisation (lE.ip . the Xj are unique: an inspection of the max(Fi) coeffi- 
cients of both sides shows that Ti{g) is determined uniquely by g, and then after fac- 
toring ri(g) out, an inspection of the max(F2) coefficients of both sides shows that 
r2(g) is determined uniquely by g, and so forth. Indeed, this algorithm shows that 
if g = {gu)uje{o,iy+i, then rj(g) G Fp. is a continuous function of the coordinates 
fi-maxCFi), • • • ,fi'max{FO ouly; indeed, equating Tp. with Gcodim{F,), the group element Ti{g) 
is an explicit word in these coordinates. Conversely, 5'max(F,) is a word in Ti(g), . . . , ri(g) 
only. 



LINEAR EQUATIONS IN PRIMES 



81 



Recall that we are aiming to prove Proposition 111.51 which establishes a constraint 
amongst the 2*"*"^ vertices of a parallelepiped in G/F, an s-step nilmanifold. Henceforth 
we assume that we are in this setting (the discussion up to now has been valid quite 
generally). The preceding observations allow one to prove a related fact, namely that 
if G is s-step nilpotent and if (5'a;)t^e{o,i}'+^ ^ HK'^+^(G',) then Qqs+i is a word in the 
Qui-i e {0, l}^"*"^. Indeed the nilpotence of G implies that the final face group Tir^^^^ is 
trivial, and hence r2s+i(g) = id for all g. Thus Qqs+i = gma^iF^s+i) is a actually a word 
in ri(g), . . . , r2o+i-i(g), and hence in the g^, lj e {0, 1}^+^. 

To prove Proposition 111.51 we must show how this constraint "descends" to G/T. A 
step in this direction is the following lemma, which follows immediately from the fact 
that gos+i is a word in the g^, oj G {0, l}^"*"^. 

Lemma E.7. Suppose that g = {guj)uje{o,i}^+^ ^ ILK^^^^G,) and that g^ E T for all 
uj G {0, 1}^^^. Then the remaining point ggs+i lies in T as well. 



We have defined the Host-Kra cube group; now we define the Host-Kra nilmanifold. 

Definition E.8 (Host-Kra cube nilmanifold). We define the Host-Kra nilmanifold 
HK*+^(G'./r) to be the s-step nilmanifold RK'+\G,)/ {T^'^'^^'^' n HK"+^(G.)). 



A priori, this definition does not make sense. The Lie group HK*'''^(G,) is connected, 
simply-connected and s-step nilpotent Lie group (the nilpotence follows from the fact 
that it is a subgroup of G^^'^^" and the simple-connectedness from the factorisation 
( lE.ip together with the simple-connectedness of the face groups = Gcodim{Fi))- We 



have not, however, shown that P'^^'^J'"^^ fl HK'^^^(G,) is cocompact inside it. This is the 
business of Lemma fE.lOl below. To prove it, we will need a basic topological property 
of nilmanifolds, first established in the foundational paper of Mal'cev [37] . 

Lemma E.9. [33| Let G be a connected, simply- connected nilpotent Lie group, and let 
r be a discrete cocompact subgroup. Then for any j ^ 1 the group P fl Gj is discrete 
and cocompact in Gj . 

Remark. To obtain results such as the Main Theorem in the case s = 2, we need only 
consider nilmanifolds which are products of Heisenberg examples. This was observed in 
Proposition 18. 4[ In this case. Lemma IE. 91 can easily be verified by hand using calcula- 
tions along the lines of those in [271 Appendix B]. 

Lemma E.IO. P^O'i>'^'nHK'+^(G.) is a discrete and cocompact subgroup o/HK'*^^(G,). 



Proof. The discreteness is obvious, since pi'^'i}"^^ is discrete in G^^'^^"^^ . Now by Lemma 
lE.QI there is a a compact set Kj C Gj such that Gj = Kj fl (PflGj). For each i, consider 
the subgroup Hi ^ HK^"'"^(G,) consisting of those g such that Ti(g) = ■ ■ ■ = rj(g) = id. 
By our earlier observations this is the same as the subgroup {g : 
gmax(Fi) = id}, and hence in particular Hi is normal in HK''"'"^(G,). 

Suppose that 1 ^ z ^ 2*+^ and that g G Hi_i. Then g = Ti{g)h, where h E Hi. Since 
rj(g) lies in the fact group Tp-, we may write it as {ki'ji)^^ where ki G -ft'codim(Fi) and 
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7j G r n G'codim(Fi)- Since Hi is normal, we may hence write 
where h' is another element of Hi. 

Continuing inductively until i = 2*+^, we eventually express an arbitrary element 
of HK''+^(G,) as a product of (fci)-^! . . . (fca^+O-^a^+i times an element of r^°'^>'^' n 
RK'+^G,). Since the set 

is compact, this proves the lemma. □ 



Proof of Proposition The projection G^'^'^^"^^ {G /T)^'^'^^"^^ induces a 1-1 con- 

tinuous map from the compact set HK"+^(G'./r) to {G /T)^'^'^^'^\ Henceforth, we con- 
sider the former set as a a compact subset of the latter. Let p be the restriction to 
HK'+^(G./r) of the obvious projection from (G'/r)^0'i>'^' (G/r)^0'i>-^\ ^ 
be the range of this map. It follows from Lemma [E.7I that this map is 1-1, and hence 
there is a unique map P : S ^ G/T such that {P(x),x) E HK'^'''^(G',/r) for ev- 
ery X = (a;(^)^g{o 1}^+^ ^ ^- '^^^ ^ automatically continuous since its graph 
HK*^^(G',/r) is compact and all spaces involved are Hausdorff. 

Proposition 111.51 follows immediately from Lemma IE. 41 □ 
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